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Abstract 

A  growing  and  important  class  of  traffic  in  the  Internet 
is  so-called  “streaming  media,”  in  which  a  server  trans¬ 
mits  a  packetized  multimedia  signal  to  a  receiver  that 
buffers  the  packets  for  playback.  This  playback  buffer,  if 
adequately  sized,  counteracts  the  adverse  impact  of  de¬ 
lay  and  reordering  suffered  by  packets  as  they  traverse 
the  network.  If  large  enough,  that  buffer  can  addition¬ 
ally  provide  adequate  delay  for  the  receiver  to  request 
that  the  source  retransmit  lost  packets  before  their  play¬ 
back  deadline  expires.  We  call  this  framework  for  re¬ 
transmitting  lost  streaming-media  packets  “Soft  ARQ” 
since  it  represents  a  relaxed  form  of  Automatic  Re¬ 
peat  reQuest  (ARQ).  While  schemes  for  streaming  me¬ 
dia  based  on  Soft  ARQ  have  been  previously  proposed, 
no  work  to  date  systematically  addresses  two  important 
questions  induced  by  Soft  ARQ:  (1)  at  any  given  point 
in  time,  what  is  the  optimal  packet  to  transmit?  And, 
(2)  when  and  how  does  a  receiver  generate  feedback  to 
the  source?  In  this  paper,  we  address  both  of  these  ques¬ 
tions  with  a  framework  for  streaming  media  retransmis¬ 
sion  based  on  layered  media  representations,  in  which 
a  signal  is  decomposed  into  a  discrete  number  of  layers 
and  each  successive  layer  provides  enhanced  quality.  In 
our  approach,  the  source  chooses  between  transmitting 
( 1 )  older  but  lower-quality  information  and  (  2)  newer  but 
higher-quality  information  using  a  decision  process  that 
minimizes  the  expected  signal  distortion  at  the  receiver. 
To  this  end,  we  develop  a  model  of  our  streaming  me¬ 
dia  system  based  on  a  binary  erasure  channel  with  in¬ 
stantaneous  feedback  and  use  Markov-chain  analysis  to 
derive  the  optimal  strategy.  Based  on  this  analysis,  we 
propose  a  practical  transmission  protocol  for  streaming 
media  that  performs  close-to-optimal  retransmission  and 
can  adapt  to  dynamic  network  conditions.  To  demon¬ 
strate  the  efficacy  of  this  protocol,  we  simulate  our  sys¬ 
tem  and  present  results  that  illustrate  significant  perfor¬ 
mance  benefits  both  from  layering  the  media  signal  and 
adaptively  estimating  a  retransmission  deadline. 

1  Introduction 

A  common  class  of  traffic  on  the  Internet  is  so-called 
“streaming  media,”  where  real-time  signals  like  audio 
and  video  are  delivered  from  a  server  somewhere  in  the 
network  to  a  human  user  that  interactively  views  the  ma¬ 
terial.  Unlike  human-to-human  communication,  which 
requires  relatively  tight  and  consistent  end-to-end  delays 
for  good  interactive  performance  [6],  server- to-human 
communication  can  afford  a  certain  level  of  artificial  de¬ 
lay.  As  a  result,  streaming  media  applications  often  have 
sufficient  time  to  recover  from  lost  packets  through  re¬ 


transmission  and  thereby  avoid  unnecessary  degradation 
in  reconstructed  signal  quality.  We  refer  to  this  delay- 
constrained  Automatic  Repeat/reQuest  system  as  “soft 
ARQ,”  because  it  represents  a  relaxed  form  of  ARQ  in 
which  the  successful  on-time  delivery  of  every  packet  is 
not  guaranteed. 

Soft  ARQ  has  been  exploited  in  research  protocols 
like  STORM  [30]  and  MESH  [17]  and  in  commercial 
products  like  RealNetworks  clients  and  servers.  These 
prior  works  have  focused  on  how  to  choose  the  playout 
delay  and  how  to  decide  if  retransmissions  will  arrive  in 
time.  However,  they  assume  that  when  the  sender  wants 
to  retransmit  a  packet  it  will  be  able  to.  This  is  not  nec¬ 
essarily  true  when  the  there  are  rate  constraints  on  the 
sender.  In  this  case,  the  sender  has  to  consider  not  only 
whether  a  (re)transmitted  packet  will  arrive  in  time,  but 
if  that  packet  is  more  beneficial  than  other  unsent  pack¬ 
ets.  There  is  no  existing  solution  to  the  problem  of  how  a 
sender  optimally  chooses  what  available  data  to  retrans¬ 
mit  when  the  receiver  indicates  loss. 

In  this  paper,  we  propose  a  framework  to  solve  this 
optimization  problem.  In  our  scheme,  the  sender  repre¬ 
sents  its  signal  in  a  layered  format  and  at  any  given  time 
transmits  the  “most  important”  information  conditioned 
on  receiver  feedback  and  constrained  by  the  available  bit 
rate  (which  is  either  pre-configured  or  inferred  from  a 
companion  congestion  control  algorithm  [25]).  We  as¬ 
sume  and  study  the  case  that  packet  loss  is  high  and  the 
source  rate  of  the  highest-quality  version  of  the  signal  ex¬ 
ceeds  the  available  capacity.  If  the  packet  loss  rate  is  suf¬ 
ficiently  low  that  the  effective  channel  capacity  is  larger 
than  the  source  rate,  then  simple  ARQ  would  more  or 
less  suffice  and  our  problem  would  be  solved. 

Figure  1  illustrates  our  model  for  a  streaming  lay¬ 
ered  multimedia  transmission  system.  The  transmis¬ 
sion  processes  begins  with  a  multimedia  signal  X  at  the 
sender.  We  assume  that  the  entire  signal  is  not  avail¬ 
able  prior  to  the  start  of  transmission — in  other  words, 
it  is  either  generated  or  retrieved  from  storage  concur¬ 
rently  while  the  the  transmission  process  is  going  on. 
The  signal  is  segmented  in  time  into  equal  length  seg¬ 
ments  or  “frames”;  these  frames  are  produced  periodi¬ 
cally  as  the  signal  is  generated.  We  denote  frame  n  as 
X  " .  The  signal  is  also  encoded  into  a  hierarchy  of  N 
layers  { X ] ,  AT , ....  .V  \- } .  where  X\  is  the  most  “impor¬ 
tant”  layer  and  X jv  is  the  least  “important”  layer,  and  we 
assume  that  all  layers  have  the  same  bit-rate.  We  assume 
that  the  importance  of  a  layer  can  be  quantified,  so  that 
successfully  transmitting  the  most  important  layer  of  a 
frame  results  in  a  greater  benefit  ( e.g .,  a  greater  increase 
signal  quality  or  decrease  in  distortion)  than  a  less  im¬ 
portant  layer  of  that  frame.  We  denote  the  ith  frame  of 
layer  l  by  Xf  .  These  layer/frame  segments  form  the  ba¬ 
sic  transmission  units  or  “messages”  that  are  sent  across 
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Figure  1 :  System  diagram  of  layered  transmission  over  a  binary  erasure  channel  with  feedback. 


the  network  ( e.g .,  contained  in  packets).  The  sender  op¬ 
erates  under  a  transmission  rate  constraint,  which  man¬ 
ifests  itself  as  a  lower  bound  on  the  minimum  time  be¬ 
tween  message  transmissions. 

To  capture  the  effect  of  network  packet  losses,  each 
message  passes  through  a  binary  erasure  channel  (BEC) 
on  its  way  to  the  receiver.  The  BEC  either  erases  (drops) 
a  packet  with  probability  e  or  successfully  transmits  the 
packet  with  probability  1  —  e.  The  BEC,  in  conjunction 
with  an  instantaneous  feedback  path,  serves  as  an  ideal¬ 
ized  model  for  the  network.  We  assume  a  positive  ac¬ 
knowledgment  (ACK-based)  scheme  is  used  for  retrans¬ 
mission  requests.  Messages  which  successfully  reach  the 
receiver  are  used  to  reconstruct  the  signal.  Because  we 
have  assumed  a  “streaming”  multimedia  scenario,  the  re¬ 
ceiver  starts  playback  of  the  signal  even  while  it  is  still 
being  generated  and  transmitted  at  the  source.  At  some 
fixed  time  after  frame  i  is  produced  at  the  source,  it  is  re¬ 
constructed  from  whatever  layers  Xf  have  arrived  at  the 
receiver  and  played  back. 

An  important  component  of  our  model  is  the  trans¬ 
mission  policy,  located  at  the  sender.  This  policy  dic¬ 
tates  which  message  (frame  and  layer)  the  source  should 
transmit  (or  retransmit)  for  any  possible  situation.  For 
every  feasible  set  of  unsent  (or  sent  but  dropped)  mes¬ 
sages  and  their  corresponding  playback  deadlines  (i.e., 
the  latest  time  they  can  be  sent  before  they  are  no  longer 
useful  to  the  receiver),  the  transmission  policy  contains  a 
rule  indicating  which  message  the  sender  should  choose 
to  transmit  next.  Our  problem  is  to  find  the  transmission 
policy  which  optimizes  the  quality  of  the  delivered  me¬ 
dia  signal.  We  define  the  optimal  policy  n*  as  the  trans¬ 
mission  policy  that  minimizes  the  distortion  of  the  signal 
reconstructed  from  the  successfully  received  messages. 

The  need  for  a  policy  stems  from  the  fact  that  mes¬ 
sages  can  have  both  different  priorities  (due  to  the  layer¬ 
ing)  and  different  time  constraints  (due  to  the  framing 
and  streaming  playback).  Decisions  of  what  to  trans¬ 


mit  so  as  to  maximize  signal  quality  are  simple  when 
the  choice  is  restricted  to  messages  from  within  a  sin¬ 
gle  frame:  the  sender  should  (re-)transmit  the  most  im¬ 
portant  layer  of  that  frame  that  has  not  been  success¬ 
fully  transmitted  yet.  Likewise,  decisions  what  message 
to  transmit  among  all  of  the  messages  of  a  single  layer 
are  also  clear:  (re-)transmit  the  oldest  message  of  that 
layer  will  still  arrive  in  time  for  playback.  However, 
the  decision  is  not  necessarily  clear  when  choosing  be¬ 
tween  messages  from  different  frames  and  different  lay¬ 
ers.  Specifically,  how  do  you  decide  between  sending 
an  older,  lower  priority  message  Ol  and  a  newer,  high 
priority  message  A//?  (With  the  above  terminology,  if 
Lo  =  X{  and  Nh  =  X-jn ,  then  we  must  have  l  >  m 
and  i  <  j.)  There  are  fundamental  tradeoffs  between  the 
data’s  importance  and  its  time-constraints.  A  reason  to 
favor  Ol  is  because  it  has  an  earlier  playback  point  than 
N/i,  so  there  is  less  time  and  hence  fewer  opportunities 
in  which  to  successfully  transmit  it.  However,  an  argu¬ 
ment  for  choosing  message  Nh  is  that  it  is  part  of  a  more 
important  layer  and  hence  provides  a  greater  distortion 
reduction  than  Ol-  Choosing  to  send  the  less  important 
Ol  leaves  fewer  transmission  opportunities  for  the  more 
important  Nh',  if  the  loss  rate  is  high  it  may  take  all  of 
those  opportunities  to  successfully  transmit  Nh-  Thus 
it  is  not  immediately  clear  which  choice  is  better — i.e., 
which  choice  results  in  a  higher  average  signal  quality. 
As  a  result,  the  sender  relies  on  the  transmission  policy 
to  tell  it  what  choice  to  make. 

A  transmission  policy  consists  of  a  set  of  decisions 
like  the  one  above,  a  set  which  covers  all  possible  choices 
that  may  need  to  be  made.  In  other  words,  given  all  pos¬ 
sible  sets  £  =  {l i , . . . ,  („  }  of  layers  which  may  be  trans¬ 
mitted  and  a  set  T  =  {ti , . . . ,  t„}  corresponding  to  how 
much  time  remains  before  they  expire,  the  transmission 
policy  dictates  which  layer  l,  G  £  will  be  transmitted.  In 
Section  2  we  consider  ways  of  finding  the  optimal  trans¬ 
mission  policy  7 r*  that  results  in  the  lowest  average  dis- 
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tortion  in  the  reconstructed  signal.  We  develop  a  Markov 
chain  analysis  of  this  layered  transmission  system  and 
show  that  the  optimal  policy  tt*  depends  on  many  fac¬ 
tors,  such  as  the  frame  lifetime  (a  function  of  the  one¬ 
way  network  and  playout  delays),  the  frame  rate  (time 
between  frames),  the  erasure  rate,  and  the  relative  im¬ 
portance  of  the  layers.  We  perform  the  Markov  analysis 
for  the  case  that  these  factors  are  fixed  and  do  not  vary 
over  time,  and  we  use  the  resulting  analysis  to  compute 
the  average  distortion  incurred  by  a  given  policy.  The  op¬ 
timal  policy  tt*  is  found  by  searching  the  distortions  of 
all  possible  policies.  A  key  result  is  that  transmission  de¬ 
cisions  of  tt*  are  time-invariant  and  thus  do  not  change  as 
the  layers  approach  their  playback  times.  For  the  trans¬ 
mission  tradeoff  mentioned  above,  this  result  means  that 
if  the  best  policy  dictates  Ol  should  be  sent  instead  of 
A'//  at  some  time  t,  and  this  attempt  is  erased,  then  O  /, 
would  be  chosen  again  and  retransmitted.  It  might  seem 
logical  to  “give  up”  on  Ol  at  some  point  and  concen¬ 
trate  on  the  more  important  Nn .  However,  our  results 
indicate  that  this  is  not  true,  and  Ol  should  continue  to 
be  chosen  and  retransmitted  in  lieu  of  Nh  until  it  either 
successfully  reaches  the  receiver,  or  enough  time  passes 
that  transmissions  of  Ol  can  no  longer  reach  the  receiver 
in  time  for  playback. 

Having  found  the  optimal  policy  when  network  con¬ 
ditions  are  static,  in  Section  3  we  consider  methods  of 
adapting  the  transmission  protocol  when  the  erasure  rate 
and  playback  point  (data  lifetime)  vary  over  time.  We  de¬ 
scribe  an  algorithm  for  estimating  the  erasure  rate  which 
is  based  on  the  technique  in  the  Real-time  Transport  Pro¬ 
tocol  (RTP)  [26],  and  we  also  present  a  novel  algorithm 
for  estimating  the  data  lifetime.  We  conclude  Section  3 
with  a  protocol  for  changing  the  transmission  policy  ac¬ 
cording  to  the  current  estimates  of  the  data  lifetime  and 
channel  erasure  rate. 

We  evaluate  our  protocol  through  network  simula¬ 
tion  and  present  the  results  in  Section  4.  By  compar¬ 
ing  our  protocol’s  performance  to  algorithms  which  do 
not  use  layering,  do  not  adaptively  estimate  the  lifetime, 
or  do  not  change  the  transmission  policy  as  the  erasure 
rate  changes,  we  quantify  the  importance  of  each  of  these 
techniques  to  the  overall  protocol  performance.  We  find 
that  layering  the  data  and  accurately  estimating  the  data 
lifetime  provide  substantial  performance  improvements. 
We  also  find  that  only  marginal  improvements  are  had 
by  adapting  the  transmission  policy  (i.e.,  the  decisions  to 
favor  older,  less  important  layers  over  newer  ones,  and 
vice  versa)  as  the  erasure  rate  and  lifetime  changes. 

Section  5  describes  related  work  on  areas  that  in¬ 
clude  soft  ARQ,  alternative  methods  for  increasing  relia¬ 
bility  of  time-limited  data  transmissions,  and  mathemat¬ 
ical  analysis  of  problems  similar  to  this  one.  Section  6 
describes  areas  for  future  work,  and  concluding  remarks 


are  given  in  Section  7. 

2  Analysis 

We  now  present  a  formal  analysis  for  the  layered  trans¬ 
mission  system  described  above  and  illustrated  by  Fig¬ 
ure  1 .  The  problem  is  to  find  the  best  transmission  pol¬ 
icy  (i.e.,  set  of  transmission  decisions)  for  a  given  set 
of  known  parameters,  such  as  the  packet  erasure  proba¬ 
bility  and  data  lifetime.  In  order  to  solve  this  problem, 
we  break  it  down  into  parts.  First,  we  formalize  the  pa¬ 
rameters  of  our  layered  transmission  system  and  define 
a  state  space  which  captures  its  dynamics — what  layers 
of  what  frames  have  already  been  transmitted,  how  long 
before  each  frame  expires,  etc.  We  then  apply  Markov 
chain  analysis  to  find  the  steady-state  behavior  of  the 
transmission  system.  From  the  steady-state  analysis  we 
obtain  a  distribution  on  the  number  of  layers  per  frame 
that  are  successfully  received  before  the  frame  expires. 
We  then  combine  this  information  with  a  cost  function 
(e.g.,  a  rate-distortion  curve)  to  find  the  average  cost  as¬ 
sociated  with  a  specific  transmission  policy.  Finally,  we 
obtain  the  optimal  policy  by  searching  over  all  possible 
policies  to  find  the  one  with  the  lowest  cost. 

2.1  Variable  and  State  Space  Definitions 

Our  transmission  model  consists  of  a  multimedia  signal 
transmitted  from  the  source  to  a  receiver  over  a  binary 
erasure  channel  with  feedback.  We  make  the  following 
assumptions  and  definitions  with  regard  to  this  model: 

•  The  channel  erases  a  packet  sent  from  sender  to  re¬ 
ceiver  with  probability  e.  Erasures  are  independent. 

•  We  ignore  transmission  delay  in  both  directions  be¬ 
tween  sender  and  receiver.  The  sender  has  instanta¬ 
neous  feedback  and  knows  if  the  packet  just  trans¬ 
mitted  was  erased  and  needs  retransmission. 

•  The  multimedia  signal  is  segmented  in  time  into 
frames  that  are  generated  periodically  every  T  time 
units. 

•  Each  frame  is  further  encoded  into  a  hierarchy  of 
N  layers. 

•  Unit  time  is  the  length  of  time  it  takes  to  transmit 
one  message  (one  layer  of  one  frame),  and  one  sec¬ 
ond  denotes  a  one  time  unit. 

•  T  >  N,  so  that  there  is  at  least  one  chance  to  trans¬ 
mit  each  layer  of  every  frame. 

•  Each  frame  has  a  lifetime  at  the  sender  of  L  sec¬ 
onds;  any  messages  sent  more  than  L  seconds  after 
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Variable 

Meaning 

L 

frame  lifetime 

T 

period  of  frame  production 

N 

number  of  layers  per  frame 

I< 

maximum  number  of  frames  “alive” 

Table  1:  Summary  of  transmission  model  variables. 


Variable 

Meaning 

Eqn 

st 

transmission  state  at  time  t 

2 

phase  within  a  T -length  cycle 

3 

I\  -tuple  of  the  transmission  state 
of  the  currently  live  frames  nj.4' 

4 

wr 

ni 

number  of  frame  Fs  layers  suc¬ 
cessfully  sent  by  time  t 

- 

the  frame  is  produced  will  arrive  too  late  for  play¬ 
back  at  the  receiver.  We  say  that  a  frame  produced 
at  time  t  “expires”  at  the  sender  at  time  t  +  L.  Be¬ 
cause  we  have  assumed  there  is  no  network  delay, 
this  lifetime  is  solely  a  function  of  delays  at  the  re¬ 
ceiver:  specifically,  L  is  the  playback  delay  less  any 
processing  delays. 

•  L  >  T,  so  that  there  is  at  least  some  overlap  in  the 
lifetimes  of  consecutive  frames.  This  leads  to  sit¬ 
uations  requiring  a  non-obvious  decision  between 
transmitting  a  less  important  message  of  a  older 
frames  and  a  more  important  message  of  a  newer 
frame. 


•  The  maximum  number  of  frames  “alive”  at  any  time 
is  K.  A  live  frame  is  one  that  has  already  been 
generated  but  has  not  yet  expired.  K  is  further  ex¬ 
plained  below. 

Because  all  of  the  frames  of  the  multimedia  signal 
are  not  available  to  the  sender  at  the  start  of  the  trans¬ 
mission  (the  signal’s  frames  are  produced  periodically  ), 
and  because  each  frame  only  has  L  seconds  after  it  is 
produced  to  be  sent  to  the  receiver,  there  is  a  finite  limit 
on  the  number  of  frames  whose  layers  can  be  considered 
valid  candidates  for  transmission.  This  maximum  num¬ 
ber  of  frames  K  alive  at  any  given  time  is  a  function  of 
how  long  they  live  (L)  and  how  frequently  they  are  pro¬ 
duced  (T),  and  is  given  by: 


K  = 


L 

T 


(1) 


Table  2.1  summarizes  the  definitions  of  the  above  vari¬ 
ables. 

After  the  first  K  frames  have  been  produced,  a  new 
frame  is  produced  and  an  old  frame  expires  once  every 
T  seconds.  Let  <j>  be  the  phase  (position)  within  a  T- 
length  cycle,  so  that  <f>  £  {0, 1, . . . ,  T  —  1},  and  let  the 
cycle  start  at  <t>  =  0  when  a  new  frame  is  produced.  Note 
that  if  the  lifetime  L  is  not  an  exact  multiple  of  T,  the 
oldest  frame  will  expire  at  phase  <f>  =  L  —  (K  —  1  )T, 
before  the  next  new  frame  is  produced.  In  this  case  there 
will  be  only  K  —  1  frames  alive  during  the  last  (AT  —  L ) 
seconds  of  a  cycle. 


Table  2:  Summary  of  the  state  space  variables’  defini¬ 
tions  and  relevant  equation  numbers. 


In  deciding  which  message  to  transmit  next  at  any 
given  time  t,  the  sender  must  consider  not  only  which 
messages  of  the  K  current  live  frames  have  been  trans¬ 
mitted,  but  how  much  time  remains  before  each  of  these 
frames  expires.  However,  the  sender  does  not  need  to 
consider  (and  hence,  remember)  any  information  about 
the  older  expired  frames  in  order  to  make  its  decision. 
Because  these  frames  have  expired,  there  is  no  point  in 
sending  any  of  their  untransmitted  messages,  and  thus 
there  is  no  need  to  remember  their  specific  expiration 
times.  Also,  although  we  may  be  able  to  infer  the  channel 
erasure  rate  through  knowledge  of  how  many  layers  of  of 
these  frames  were  successfully  transmitted,  we  have  as¬ 
sumed  that  we  already  know  the  erasure  rate  and  hence 
this  knowledge  is  not  needed  to  make  the  current  trans¬ 
mission  decision. 1 

We  can  now  define  a  state  St  that  summarizes  the 
information  the  sender  needs  to  make  a  transmission 
choice  at  time  t.  Let  ,3)  be  defined  as: 


st 

=  (0\r0% 

(2) 

40 

=  t  mod  T, 

(3) 

n  M 

(4) 

where  nj-4'  is  the  number  of  successfully  transmitted  lay¬ 
ers  of  the  /th-oldest  live  frame  at  time  t  (i.e.,  frame  1  is 
the  oldest,  frame  K  is  the  newest).  We  omit  the  t  super¬ 
script  from  <p,  n,  and  //  ,-  when  its  context  is  clear.  Be¬ 
cause  there  are  N  layers,  0  <  nt,'>  <  N .  These  state 
space  components  are  summarized  in  Table  2.1. 

The  phase  (f>  tells  us  how  much  time  is  left  before 
each  frame  expires.  For  example,  at  the  beginning  of  a 
cycle  (</>  =  0)  the  frame  K  is  produced,  and  so  we  know 
it  expires  in  L  seconds.  More  generally,  let  ttl*  be  frame 
Fs  “time-to-live,”  i.e.,  how  much  time  remains  before  it 
expires.  It  is  calculated  as: 


ttl,  =  L  —  <j>  —  iT.  (5) 

'Because  the  sender  typically  does  not  know  the  erasure  rate,  we 
consider  ways  of  estimating  the  erasure  rate  in  Section  3.3;  however, 
our  goal  here  is  to  try  to  find  the  optimal  policy  when  the  sender  pos¬ 
sesses  all  information  relevant  to  making  transmission  decisions. 
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The  A' -tuple  n  tells  us  exactly  what  layers  of  the  K 
frames  have  already  been  transmitted,  and,  conversely, 
which  layers  remain  for  each  frame.  At  any  time  t,  the 
n\l>  -most  important  layers  of  frame  i  have  been  transmit¬ 
ted,  and  so  there  are  N  —  u|(|  layers  remaining.  At  the 
beginning  of  a  cycle  (<f>  =  0)  a  new  frame  is  produced, 
so  n  k  =  0.  Also  at  this  time,  all  of  the  frames  “age”  one 
position  in  the  it -tuple  n.  To  see  this,  suppose  that  at  a 
time  t  at  the  start  of  one  cycle  ( t  mod  T  =  0),  we  have  a 
state 

St  =  (o,  n(4)  =  ,  nit] j  )  . 

Now  suppose  that  the  next  T  transmission  attempts  are 
all  erased,  so  that  no  frame  gets  any  more  messages 
across.  For  this  case  the  next  T  states  are  independent  of 
our  transmission  policy — regardless  of  which  messages 
the  policy  dictated  we  tried  to  (re-)transmit,  they  were  all 
erased — and  our  state  evolves  with  time  as: 

St+i  =  (l,n(i)) 

St+2  =  (2,n(4)), 

St+T-i  =  (T-  l,n(t)). 

At  time  t  +  T,  immediately  following  the  Tth  erasure, 
a  new  frame  arrives  and  a  new  cycle  begins.  Because 
the  oldest  frame  of  the  previous  cycle  has  expired  by  this 
time,  we  no  longer  track  its  state.  The  new  state  at  time 
t  +  T  is  St+r  =  (0,  nW),  where 


which  frames  should  be  transmitted,  or  retransmitted,  at 
any  given  time.  To  illustrate  this  dependency  we  first  ex¬ 
amine  how  S  can  change  in  a  single  time  step. 

Consider  the  possible  transitions  from  a  state  St  to 
St+i-  The  transition  of  the  phase  component  <j>  of  the 
state  is  completely  deterministic: 

$(4+1)  =  (4>w  +  1)  mod  T.  (7) 


As  a  result,  we  focus  our  attention  on  the  transitions  of 
the  transmission  state  vector  n.  There  are  K  components 
in  of  n,  each  of  which  can  take  on  any  of  A’  -•  1  values 
(0  <  ii  ,  <  N ),  so  the  maximum  number  of  possible  val¬ 
ues  n  may  take  on  is  M  =  ( N  + 1) R .  However,  there  are 
only  two  values  that  n',':  1 ;  may  take  on  for  a  given  value 
of  rd4' .  To  see  this,  first  assume  that  at  time  t  we  are  not 
at  the  end  of  a  cycle:  <:/t]  ^  T  —  1.  The  transmission 
policy  7 r  contains  a  rule  for  every  state  St  =  ,  rd4') 

which  dictates  what  frame’s  layer  should  next  be  trans¬ 
mitted,  or  retransmitted  if  a  previous  attempt  has  failed. 
If  7 r(S)  is  the  frame  that  the  policy  dictates  be  chosen 
for  a  state  S,  then  at  time  t  the  most  important  layer  of 
frame  n{St)  not  yet  successfully  transmitted  would  be 
sent.  This  layer  is  +  l)-most  important  layer, 

since  the  first  n^)s  ,  layers  of  frame  7r (St)  have  already 
been  transmitted.  This  transmission  can  either  succeed 
or  be  erased.  If  it  is  erased  then  n  does  not  change;  if 
the  transmission  succeeds  then  n u  '  1 :  differs  from  nU] 
in  only  one  component: 


n(t+1)  ~  n{t) 

nn(St)  ~  "niS 


+  1. 


(8) 


n<4)  =  [nli),r4*),...,n^),o]  .  (6) 

The ~ operator  left  shifts  each  frame’s  state  one  position 
to  reflect  how  each  frame  ages  one  position  per  cycle  as 
one  frame  expires  and  a  new  frame  arrives.  In  this  simple 
example,  the  values  nf'1  did  not  change  (except  for  the 
position  shifts)  because  all  of  the  transmission  attempts 
were  failures.  Next,  we  consider  how  to  analyze  the  state 
evolution  for  the  more  general  case  when  some  transmis¬ 
sions  succeed  and  some  are  erased. 

2.2  Markov  Chain  Analysis 

In  this  section  we  present  an  analysis  of  the  process 
S  =  {S0,Si,S2,...},  which  illustrates  how  the  state 
space  evolves  with  time.  We  perform  this  analysis  so 
we  can  find  the  steady-state  behavior  of  N;  with  this 
knowledge  we  can  calculate  the  expected  distortion  of 
incurred  by  a  particular  policy  n.  The  steady-state  be¬ 
havior  of  S  depends  on  both  the  erasure  rate  of  the  chan¬ 
nel,  which  determines  the  chance  of  a  successful  trans¬ 
mission,  and  our  policy  7 r,  which  dictates  what  layers  of 


Because  the  probability  of  an  erasure  is  e,  the  one-step 
transition  probability  is: 


(n<4+D 


=  < 


£ 

1  —  £ 


ifnd+D  =  n(i) 

if  n(4  +  !)  _  (t)  i  1 

lr  nw(St)  -  nw(S, )  +  d 

nf+1)  =  nf\ 

else. 


j  ^  n(St) 


(9) 

We  encapsulate  the  one-step  transition  probabilities 
of  all  M -possible  values  of  n((|  in  an  M  x  M  state  tran¬ 
sition  matrix  P(j,,  where  <f>  =  t  mod  T.  Assume  that 
we  have  a  function  /  which  maps  each  possible  value 
of  n  to  a  unique  index  i  £  1 , ,M\  for  example, 
/([1,0,0])  =  2and/_1(2)  =  [1,0,0],  With  this  map¬ 
ping  function,  the  components  of  are  defined  as: 


[P*]i,j  =  P  (n(t+1)  =  f~Hj) 


n «  =/-1(0)  ,  (10) 


where  the  conditional  probability  can  be  found  using 
Equation  9.  Each  row  i  of  P,,  contains  two  non-zero  el¬ 
ements:  e  in  column  i,  and  1-e  in  column  j,  where  j  is 
determined  by  the  policy  7r. 
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In  our  analysis  so  far  we  assumed  that  we  were  not 
at  the  end  of  a  cycle  at  time  t.  However,  if  we  are  at 
the  end  of  a  cycle  (r/()  =  T  —  1)  the  state  transition 
matrix  given  by  Equation  10  is  not  quite  correct.  It  fails 
to  account  for  the  arrival  of  a  new  frame  and  the  aging  of 
each  frame  of  the  previous  cycle  by  one  position.  This 
is  corrected  by  right-multiplying  the  matrix  Pt- i  by  a 
matrix  Pa,  which  left-shifts  each  state  by  one  position. 
Letting  n  denote  n  shifted  left  by  one  (see  Equation  6), 
the  elements  of  Pa  are  defined  as: 


[Pa] 


1  if/  1(j)  =  f~1(‘i)  (H) 

0  else. 


The  new  state  transition  matrix  P^_1  to  describe  transi¬ 
tions  from  n*4'  to  r/4+1'  when  <f>^  =  T  —  1  is  simply 
the  product  P<pPa- 

We  can  now  use  the  one-step  state  transition  prob¬ 
abilities  in  order  to  find  the  steady-state  behavior  of 
S  =  {So,  Si,  S-2,  ■  ■  ■}■  Because  erasures  are  indepen¬ 
dent,  S  is  a  discrete-time  Markov  chain.  In  other  words, 
the  probability  of  being  at  some  state  si+il  in  the  future 
does  not  depend  on  any  past  knowledge  of  the  process 
St~t2  if  we  know  the  current  state  ,s ( .  The  only  factors 
that  affect  transitions  from  ,s(  to  st+t j  are  the  transmis¬ 
sion  policy  and  the  erasure  rate.  Their  influence  can  be 
summarized  as  follows:  £  affects  the  chance  that  n  will 
change,  and  n  determine  how  it  changes. 

Because  n  includes  phase  information,  S  is  also 
cyclostationary  with  period  T.  This  is  because  it  is 
not  possible  to  go  from  a  state  n  at  time  t  to  the 
same  state  n  in  less  than  T  steps.  The  process  S,j,  = 
{Sip,  S<p+T ,  Sij,+2T,  ■  ■  ■},  4>  G  {0, . . . ,  T  -  1},  is  a  sta¬ 
tionary  process,  however.  Its  M  x  M  state  transition 
matrix  P{"]  is  derived  from  Equations  10  and  11: 

PW  =  P+P4+1  ■  ■  ■  PT-iPaPoPi  ■  ■  ■  P*- 1.  (12) 

A  stationary  distribution  {S(p,S(f,+T,S(p+-2T,  ■  ■  ■} 
can  be  found  analyzing  the  matrix  of  Equation  12.  Let  // 
be  the  stationary  distribution  when  the  oldest  live  frame 
expires,  i.e.,  <j>  =  L  —  (K  —  1  )T.  The  probability  v-,  of 
transmitting  the  i  most  important  layers  of  a  frame  by  the 
time  it  expires  is  calculated  by  summing  out  the  possible 
states  of  the  other  K  —  1  frames: 

N  N  N 

*=£  £■■■£  'Ifii.n-.  .»;• . 11  h  j  (13) 

n,2  =0  n3=0  riK  =0 

Note  that  although  there  are  M  =  {N+1)R  possible  val¬ 
ues  of  the  K -tuple  n,  the  number  of  feasible  values  may 
actually  be  lower.  Which  states  are  unfeasible  will  de¬ 
pend  on  the  policy  7 r.  For  example,  if  7r  dictates  that  the 
most  important  layer  of  the  oldest  live  frame  is  always 


chosen,  then  it  is  not  possible  to  have  n%  ^  0  if  no  <  N, 
since  transmission  of  any  message  of  the  third  oldest 
frame  would  not  commence  until  all  messages  from  the 
second  oldest  frame  had  been  sent.  The  transmission 
policy  will  have  no  rule  associated  with  these  states.  To 
get  around  this  problem,  we  can  remove  each  unfeasible 
state  n„  from  the  analysis,  and  thus  have  M1  <  M  states. 
Alternatively,  we  can  still  keep  M  states  and  assign  a 
probability  of  1  to  the  [/(n„),  /(n,,)]  entries  of  each  P(j, 
matrix  defined  by  Equation  9.  The  stationary  probability 
of  these  states  ^/(n„)  will  then  0  because  they  are  null- 
recurrent,  and  thus  their  presence  will  change  the  result 
of  Equation  13. 

Finally,  given  a  rate-distortion  function  D( /?  )  such 
that  D(i)  is  the  distortion  incurred  in  reconstructing  a 
frame  from  its  i  highest  priority  layers,  0  <  i  <  N, 
we  can  compute  the  average  distortion  per  frame  for  a 
transmission  policy  n  by 

N 

Dn  =  J2^D(i)-  (14) 

2—0 

Equation  14  can  be  interpreted  as  a  weighted  sum:  the 
distortion  DU)  of  reconstructing  a  frame  from  i  layers  is 
weighted  by  the  probability  v-,  that  only  i  layers  of  the 
frame  are  successfully  sent  in  time  for  playback.  One 
constraint  on  these  weights  is  that  the  expected  num¬ 
ber  of  layers  transmitted,  N avg,  can  not  exceed  either  the 
channel  capacity  C  or  the  raw  transmission  rate  R: 

N 

^Vavg  =  £  Vi i  <  min(C,  R)  =  min  ((1  —  e)T,  N) , 

2  =  1 

(15) 

where  C  =  (1  —  e)T  is  a  basic  information  theoretic 
result  on  the  capacity  of  a  binary  erasure  channel  [9], 
Even  when  the  rate  R  is  less  than  the  channel  capacity 
C,  the  bound  of  Equation  15  may  still  be  unachievable 
because  the  data  is  time-constrained.  Thus  although  on 
average  there  may  be  enough  channel  capacity  to  send 
the  entire  multimedia  signal,  in  the  short  term  there  may 
be  a  sequence  of  many  consecutive  erasures  so  that  data 
expires  before  it  is  successfully  transmitted.  The  choice 
of  policy  can  affect  both  7Vavg  and  the  distribution  of  the 
rate-distortion  weights  (1 /,).  For  example,  a  policy  al¬ 
ways  favoring  the  most  important  message  of  the  oldest 
live  frame  will  maximize  the  average  number  of  expected 
layers  JVavg  and  the  chance  of  sending  all  layers  across 
(v:\).  Comparatively,  with  a  policy  that  sends  any  mes¬ 
sages  belonging  to  the  most  important  layer  ahead  of  all 
others,  the  chance  of  sending  at  least  one  message  in  a 
frame  (vi)  is  maximized  by  reducing  the  chances  of  both 
getting  none  (vo)  and  getting  all  of  them  0/y).  Which 
policy  is  better  will  depend  not  only  how  they  can  affect 
v  (which  is  also  dependent  on  T  and  L),  but  also  on  the 
shape  of  the  rate  distortion  curve  D(R). 
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2.3  Analytical  Results 

Given  Dn  and  our  Markov  chain  model,  we  can  formu¬ 
late  an  optimization  that  computes  the  best  transmission 
policy: 

7 r*  =  argminDjr,  (16) 

7r6n 

where  II  is  the  set  of  all  possible  policies. 

In  this  section  we  apply  our  Markov  chain  analysis 
to  determine  n*  for  a  given  set  of  static  network  condi¬ 
tions.  Because  our  analysis  depends  on  the  erasure  rate 
e,  the  relative  importance  of  different  priority  layers  (de¬ 
termined  by  a  rate-distortion  function  D(R )),  the  life¬ 
time  L ,  and  inter-frame  period  T,  the  optimal  policy  ir* 
will  depend  on  these  factors  as  well.  The  steps  for  find¬ 
ing  tt*  can  be  summarized  as  follows:  first,  fix  the  four 
aforementioned  parameters;  next,  perform  a  calculation 
of  every  possible  policy’s  distortion;  and  finally,  find  the 
policy  which  produced  the  minimum  distortion. 

In  our  analysis  we  focus  on  the  case  that  there  are 
two  layers  ( N  =  2)  and  that  T  and  L  are  such  that  there 
is  a  maximum  of  two  frames  alive  at  any  time  (K  =  2). 
The  reasons  for  this  choice  are  two-fold:  first,  it  simpli¬ 
fies  the  discussion  and  interpretation  of  our  results;  and 
second,  it  simplifies  the  computational  complexity  of  our 
analysis,  since  the  size  of  our  state  space  is  exponential 
in  N  and  K.  With  N  =  2  and  K  =  2  there  are  at  most 
4  messages  to  choose  from  at  any  time:  the  two  layers 
X\  and  X\  of  an  older  frame  i,  and  the  two  layers  A  [ 1  1 
and  'Yj+ 1  of  the  next,  newer  frame  i  +  1.  In  order  to 
emphasize  the  priority  and  age  of  these  4  messages,  we 
introduce  the  following  variable  names  that  will  be  used 
throughout  this  discussion: 

•  Oh  =  -V  ( :  the  high  priority  layer  of  the  older  frame 

•  Ol  =  -Y/, :  the  low  priority  layer  of  the  older  frame 

•  Nh  =  -V  [  ’  1 :  the  high  priority  layer  of  the  newer 
frame 

•  ;Y/,  =  X.',  * :  the  low  priority  layer  of  the  newer 
frame 

In  the  introduction  we  explained  that  non-obvious  trans¬ 
mission  decisions  arise  when  we  must  choose  between 
older,  less  important  messages  and  newer,  more  impor¬ 
tant  messages.  For  the  two-layer,  two-frame  case,  this 
situation  arises  in  only  one  of  the  9  possible  values  of  n: 
n  =  [1,0].  This  is  the  case  that  Oh  was  successfully 
transmitted,  and  so  either  Ol  or  N u  must  be  chosen 
next.  Each  policy  that  we  consider  in  this  section  con¬ 
sists  of  a  distinct  choice  of  Ol  or  Nh  for  each  phase  in 
the  cycle  such  that  the  older  frame  has  not  yet  expired.  In 
other  words,  0  <  <j>  <  L  —  T,  so  there  are  ‘2L~T  possible 
policies  to  consider. 


We  first  present  results  when  illustrating  the  aver¬ 
age  distortion  of  various  policies  as  a  function  of  the 
erasure  rate  e,  when  other  parameters  T,  L,  and  the 
rate-distortion  function  D(R )  are  all  held  fixed.  A  non- 
intuitive  result  is  that  the  optimal  policy  tt*  always  be¬ 
longs  to  a  subset  of  two  of  the  possible  policies,  and  these 
two  do  not  change  their  message  choice  for  a  state  n  as 
the  frames  get  closer  to  their  expiration  times,  i.e.,  as  (!> 
changes.  There  is  a  threshold  value  of  e  at  which  7r* 
switches  from  one  of  these  policies  to  the  other.  Another 
key  result  is  that  the  best  policy  on  one  side  of  this  thresh¬ 
old  was  also  the  worst  policy  on  the  other  side.  Next,  we 
illustrate  how  the  shape  of  D(R)  can  affect  the  value  of 
this  threshold;  this  tells  us  the  best  policy  as  a  function 
of  both  D(R)  and  e.  Finally,  we  examine  how  changing 
the  values  of  L  and  T  can  also  affect  the  best  policy. 

2.3.1  Effect  of  the  erasure  rate 

The  erasure  rate  e  affects  the  probability  of  successfully 
transmitting  a  message.  In  this  section  we  examine  how 
it  affects  the  choice  of  tt*  .  For  fixed  values  of  L  and 
T,  the  stationary  distribution  v  of  a  particular  policy  de¬ 
pends  only  on  the  erasure  rate  e.  However,  the  optimal 
policy  depends  not  only  a,  and  hence  e,  but  also  on  the 
rate-distortion  function  D(R ). 

Dw,  the  distortion  associated  with  a  policy  given 
by  Equation  14,  is  a  linear  function  of  D(R).  As  a  re¬ 
sult,  translating  and/or  positively  scaling  D(R)  does  not 
change  which  policy  is  optimal  (i.e.,  has  minimum  dis¬ 
tortion),  since  by  Equation  14  all  policies’  average  dis¬ 
tortions  will  be  equally  scaled  and  translated.  Therefore, 
we  can  find  and  apply  a  scaling  a  >  0  and  translation 
b  to  any  Y- 1  aver  rate -distortion  function  to  normalize 
it  so  that  the  resulting  D'(R )  =  aD{R )  +  b  satisfies 
£>'(0)  =  1  and  D'(N )  =  0.  This  new  distortion  func¬ 
tion  can  be  completely  characterized  by  the  N  —  1  values 
of  (I,  =  D'(i),  0  <  i  <  N,  subject  to  the  convexity  con¬ 
straints  (  tlj  il; .  j  :•  >  (di+ 1  —  d,:+o),  0  <  i  <  N—  1.  For 
the  two-layer  case,  this  means  that  the  form  of  any  rate- 
distortion  function  can  be  completely  summarized  by  d  \ . 
We  will  refer  to  di  as  the  “layer  gap’’  for  the  two-layer 
case  because  it  measures  the  gap  in  importance  between 
the  two  layers.  The  convexity  constraint  0  <  d\  <  0.5  is 
necessary  so  that  the  high  priority  layer  actually  is  more 
important  than  the  low  priority  layer  (or,  at  the  minimum, 
equally  important).  If  d\  is  close  to  0.5  then  both  layers 
are  of  near  equal  importance;  if  d  \  is  near  0  the  high  pri¬ 
ority  layer  has  much  more  benefit  (distortion  reduction) 
than  the  low  priority  layer. 

To  illustrate  the  effect  of  the  erasure  rate  on  the 
distortion  of  different  policies,  we  fix  the  layer  gap  at 
di  =  0.1,  the  inter-frame  period  at  T  =  4,  and  the  frame 
lifetime  at  L  =  8.  In  this  case  the  overlap  in  consec- 
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utive  frames’  lifetimes  lasts  L  —  T  =  4  seconds,  and 
so  there  are  a  total  of  24  =  16  possible  transmission 
policies.  We  found  that  of  all  possible  policies,  ir*  is 
always  one  of  the  two  “phase-invariant”  policies,  which 
either  always  choose  Ol  or  always  choose  N u  through¬ 
out  the  entire  4-second  overlap  window,  regardless  of  the 
phase.  In  other  words,  if  the  chosen  message  (Ol  or  Nh) 
is  erased,  these  two  policies  always  retransmit  it  until  it 
succeeds  or  it  expires,  whichever  comes  first.  The  opti¬ 
mality  of  the  phase-invariant  policies  can  be  seen  in  Fig¬ 
ure  2,  which  displays  the  average  distortion  as  a  function 
of  e  incurred  by  these  two  transmission  policies  and  a 
third,  phase-varying  “hybrid”  policy.  The  hybrid  policy 
shown  favors  Ol  for  the  first  two  transmissions  in  a  cy¬ 
cle  (<j>  =  {0, 1})  and  then  switches  to  favor  Njj  for  the 
last  two  transmissions  (4>  =  {2,3}).  The  scale  of  the 
y-axis  can  be  interpreted  as  follows:  assuming  that  the 
distortion  function  is  mean  squared  error,  a  one-decade 
decrease  in  distortion  corresponds  to  a  10  dB  increase  in 
signal-to-noise  ratio. 


Figure  2:  Distortion  versus  erasure  rate  for  3  different 
decision  policies,  for  T  =  4,  L  =  8,  and  d  =  0.1. 

Figure  2  shows  that  for  low  values  of  e,  the  best 
policy  always  favors  the  Ol,  for  high  values  of  e,  the 
best  policy  always  favors  the  IV  u ;  and  there  is  a  value  of 
e  where  the  two  policies  have  equal  distortion  (ps  0.44 
here).  When  the  erasure  rate  is  low,  sending  Ol  in¬ 
stead  of  Nh  is  better  because  Ol  will  expire  sooner, 
and  although  this  choice  reduces  the  number  of  possible 
transmission  attempts  for  the  more  important  Nh,  it  is 
unlikely  to  need  them  all.  However,  when  the  erasure 
increases  it  increases  the  average  number  of  attempts 
needed  to  get  Nh  to  the  receiver,  and  hence  sending 
Nh  before  0  /,  becomes  more  beneficial,  even  though 
O l  may  expire  before  the  sender  succeeds  with  Nh  ■ 

An  equally  important  finding  is  that  for  values  of  e 


in  which  the  Ol  phase-invariant  policy  is  optimal,  the 
Nh  phase-invariant  policy  is  not  only  suboptimal,  but 
it  is  also  the  worst  possible  policy.  The  converse  holds 
true  as  well.  In  general,  the  two  optimal  policy  distortion 
curves  form  the  upper  and  lower  boundaries  of  a  perfor¬ 
mance  envelope  between  which  all  other  policies’  per¬ 
formance  curves  must  lie.  Note  that  although  we  have 
only  shown  results  from  3  of  the  16  possible  policies, 
we  did  find  that  the  distortion  curves  of  other  13  do  all 
lie  between  the  envelope  formed  by  the  curves  of  the 
two  phase-invariant  policies.  Also,  although  we  have 
not  proven  this  optimal/worst  nature  of  the  two  phase- 
invariant  policies  (it  was  identified  through  exhaustive 
search  of  all  possible  policies),  we  found  that  this  prop¬ 
erty  held  true  for  all  other  combinations  of  d  \ ,  T,  and  L 
that  we  examined. 


2.3.2  Effect  of  the  layer  gap 

In  the  preceding  section  we  found  that  when  all  parame¬ 
ters  except  the  erasure  rate  were  fixed,  the  optimal  policy 
could  be  characterized  by  the  threshold  value  of  the  era¬ 
sure  rate:  if  the  erasure  rate  is  below  this  threshold,  n* 
always  chooses  Ol',  if  above,  it  chooses  Nh-  In  this 
section  we  examine  how  the  layer  gap  affects  the  value 
of  this  threshold.  Figure  3  illustrates  the  location  of  this 
threshold  (shown  on  the  x-axis)  as  the  layer  gap  is  varied 
between  0  and  0.5  (y-axis),  for  an  inter- frame  period  of  3 
and  a  frame  lifetime  of  5  (T  =  3,  L  =  5).  Area  A  to  the 
left  of  the  curve  indicates  when  the  phase-invariant  pol¬ 
icy  favoring  Ol  is  optimal;  area  B  to  the  curve’s  right 
indicates  that  the  Nh  phase-invariant  policy  is  optimal. 
The  curve  was  obtained  by  analytically  solving  for  the 
average  distortions  Dn  of  the  two  policies  as  a  function 
of  e  and  d\,  setting  them  equal  and  solving  for  e  as  d\ 
was  numerically  varied.  We  verified  the  correctness  of 
the  curve  by  sampling  the  e-d\  plane,  finding  n*  through 
exhaustive  search,  and  confirming  that  the  Ol -favoring 
policy  was  indeed  optimal  for  all  points  lying  in  area  .4, 
and  likewise  for  area  B. 

Figure  3  shows  that  as  the  layers  become  more  equal 
in  importance  (d\  increases),  the  erasure  rate  threshold 
moves  to  the  right.  This  makes  intuitive  sense:  if  there 
is  a  small  disparity  between  the  layers’  importance,  then 
Ol  is  almost  as  beneficial  as  Nh,  and  thus  unless  the  era¬ 
sure  rate  is  high  it  is  better  to  send  O l  because  it  expires 
sooner.  Conversely,  if  the  high  priority  layer  is  much 
more  important,  then  the  erasure  rate  does  not  have  to  be 
as  high  before  it  makes  sense  to  start  favoring  Nh  over 
Ol;  this  increases  the  chance  of  successfully  transmit¬ 
ting  this  more  important  message. 
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Figure  3:  Optimal  decision  policy  as  a  function  of  e  and 
d,  for  T  =  3  and  L  =  5. 

2.3.3  Effect  of  the  inter-frame  period 

We  also  examined  the  effect  of  the  inter-frame  period  on 
the  erasure  rate  threshold,  and  found  that  increasing  T 
tends  to  increase  the  threshold.  Figure  4  illustrates  this 
effect;  it  shows  threshold  curves  (as  described  in  the  pre¬ 
ceding  section)  for  each  value  of  T  between  3  to  6 ;  L  is 
set  to  T  4-  2  for  all  cases.  Increasing  T  causes  the  curve’s 
knee  to  move  further  to  the  right;  because  increases  in  T 
provide  more  transmission  opportunities  per  frame,  the 
erasure  rate  must  also  increase  before  sending  Nu  over 
Ol  becomes  more  beneficial.  We  also  see  that  as  T  in¬ 
creases,  the  spacing  between  the  curves  becomes  smaller, 
but  the  slope  of  the  knees  remains  relatively  fixed.  This 
indicates  that  changing  T  does  not  change  the  amount  of 
influence  the  layer  gap  has  on  the  erasure  rate  threshold 
value. 


erasure  rate 


Figure  4:  Effect  of  T  on  the  optimal  decision  policy. 


Figure  5:  Effect  of  L  on  the  optimal  decision  policy.  T  = 
3  for  all  curves. 

2.3.4  Effect  of  the  frame  lifetime 

Finally,  we  examined  how  the  frame  lifetime  L  affects 
the  location  of  the  erasure  rate  threshold,  and  hence  the 
optimal  policy  pi* .  In  general,  we  found  that  the  lifetime 
has  the  following  effects: 

•  Increasing  the  lifetime  moves  the  threshold  curve 
to  the  left,  so  that  it  becomes  more  beneficial  to 
send  the  Njj  over  the  Ol  at  even  lower  erasure 
rates.  We  hypothesize  that  this  is  because  if  the 
sender  chooses  to  send  the  more  important  Njj  be¬ 
fore  the  older  Ol  ,  increasing  the  lifetime  increases 
the  chance  that  Njj  is  successfully  transmitted  be¬ 
fore  Ol  expires,  thereby  increasing  the  chance  that 
Ol  can  be  sent  as  well. 

•  Increasing  the  lifetime  decreases  the  impact  of  the 
layer  gap  on  the  choice  of  7r* .  This  is  reflected  by  a 
steeper  threshold  curve. 

•  The  magnitude  of  the  difference  in  average  distor¬ 
tion  between  the  best  and  worst  policies  increases 
as  the  lifetime  increases.  This  is  because  a  longer 
lifetime  results  in  a  longer  overlap  between  consec¬ 
utive  frames’  lifetimes,  and  hence  the  transmission 
policy  can  influence  a  the  sender’s  behavior  over  a 
larger  fraction  of  the  T  second  transmission  cycle, 
for  better  or  worse. 

The  first  two  properties  are  illustrated  Fig¬ 
ures  5  and  6,  which  show  the  threshold  curves  for  var¬ 
ious  L- values  when  T  equals  3  and  4,  respectively.  In 
each  figure  L  was  varied  between  T  +  1  and  2 T.  The  lat¬ 
ter  property  was  confirmed  by  examining  D, T(e)  graphs 
(like  Figure  2)  for  various  lifetimes. 
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Figure  6:  Effect  of  L  on  the  optimal  decision  policy.  T  = 
4  for  all  curves. 

3  Adapting  to  Varying  Network 
Conditions 

Having  shown  that  an  optimal  transmission  policy  can 
be  found  for  a  static  set  of  model  parameters,  we  now 
address  the  problem  of  casting  our  optimization  onto  a 
real  streaming  multimedia  application.  Typical  stream¬ 
ing  media  applications  do  not  change  the  frame  length 
over  time,  so  we  continue  to  assume  T  is  fixed  and 
known  to  the  sender.  Similarly,  the  encoding  process 
does  not  typically  vary  with  time  with  a  typical  layered 
multimedia  stream.  As  a  result,  the  layer  gap  also  re¬ 
mains  fixed,  and  we  presume  it  is  known  to  the  sender.2 
On  the  other  hand,  the  lifetime  L  of  the  frames  at  source 
depends  on  the  delay  in  the  network  and  at  the  receiver 
(playback  buffering,  processing,  etc.),  and  these  condi¬ 
tions  change  with  time.  Likewise,  in  the  best-effort  In¬ 
ternet,  the  erasure  rate  e  varies  with  time.  Consequently, 
our  transmission  policy  must  adapt  to  changes  in  L  and 
e. 

3.1  Generalizing  the  Static  Case 

Our  goal  is  to  leverage  our  results  from  the  analysis  of 
Section  2  and  develop  a  more  general  transmission  pro¬ 
tocol  that  adapts  the  transmission  policy  to  changes  in  L 
and  e.  For  the  two-layer  (TV  =  2),  two-frame  overlap 
(I\  =  2)  case,  our  results  indicated  that  the  optimal  pol- 

2  In  reality,  the  true  relative  importance  of  the  layers  may  change 
with  time  since  multimedia  signals  are  not  stationary,  especially  over 
large  time  scales.  Furthermore,  since  it  is  difficult  to  characterize  per¬ 
ceptual  distortion,  it  is  difficult  to  know  which  value  of  d\  which  accu¬ 
rately  corresponds  to  subjective  quality.  With  a  real  signal  and  today’s 
cost  measures,  d\  is  probably  at  best  an  estimate  of  the  average  relative 
importance  of  the  layers. 


icy  always  gives  preference  to  transmitting  the  less  im¬ 
portant  message  of  the  older  frame  ( Ol )  over  the  more 
important  message  of  the  newer  frame  (TV#)  if  e  is  less 
than  a  threshold  value,  and  vice  versa  when  e  exceeds 
this  value.  The  value  of  this  threshold  depends  on  the 
values  of  d\,  T,  and  L.  Because  we  assume  that  d\  and 
T  are  fixed  throughout  the  transmission  process,  the  only 
unknown  factor  influencing  the  threshold  is  the  lifetime 
L.  Let  this  threshold  be  th{L).  Assume  that  th{L)  is 
pre-computed  for  various  values  of  L,  and  that  the  sender 
estimates  the  current  values  of  the  lifetime  and  erasure 
rate;  we  denote  these  estimates  L  and  e,  respectively.  Us¬ 
ing  these  estimates,  the  sender  can  adaptively  determine 
its  transmission  policy  by  comparing  e  to  th(L).  Sec¬ 
tions  3.2  and  3.3  describe  specific  methods  for  calculat¬ 
ing  e  and  L.  These  methods  we  describe  are  by  no  means 
the  only  techniques  for  estimating  these  quantities. 

Because  we  assumed  zero  network  delay  in  either 
direction  in  the  analysis  of  Section  2,  we  evaluate  our 
protocol’s  performance  in  Section  4  under  the  same  as¬ 
sumption.  We  do  so  because  we  wish  to  judge  how  well 
we  can  adapt  the  transmission  policy  when  we  are  given 
a  pre-computed  table  of  the  optimal  policy  under  various 
sets  of  fixed  network  conditions.  And  from  Section  4  we 
know  the  optimal  policy  for  static  conditions  when  there 
is  instantaneous  feedback;  we  do  not  know  it  for  the  case 
of  delayed  feedback  in  part  because  the  delay  causes  an 
explosion  in  the  state  space  of  the  system.3  Thus  we  re¬ 
strict  our  investigation  to  the  zero-network-delay  case  so 
as  to  accurately  compute  the  optimal  policy  for  a  static 
L  and  e.  A  result  of  this  is  that  the  lifetime  L  is  solely  a 
function  of  delays  at  the  receiver;  for  example,  L  could 
be  the  playback  delay  (defined  as  the  difference  between 
the  time  a  frame  is  played  out  and  when  it  first  arrived 
at  the  source  transmission  buffer)  less  any  processing 
delays  at  the  receiver.  We  note,  however,  that  our  life¬ 
time  estimation  technique  described  below  in  Section  3.2 
works  regardless  of  this  zero-network-delay  assumption. 
To  measure  its  effectiveness  without  varying  the  zero  net¬ 
work  delay,  we  varying  the  size  of  the  receiver’s  play¬ 
back  buffer;  this  has  the  same  effect  on  L  as  variations  in 

3  For  example,  with  instantaneous  feedback  the  A' -tuple  n  of  how 
many  layers  of  each  frame  had  been  transmitted  is  enough  to  com¬ 
pletely  characterize  the  message  transmission  state,  because  we  can 
correctly  assume  that  the  first  i  successful  messages  of  a  frame  cor¬ 
responded  to  the  i  most  important  layers.  Feedback  delay  means  that 
lower  priority  messages  may  be  successfully  transmitted  before  high 
priority  messages  of  the  same  frame,  and  so  the  overall  system  state 
must  describe  the  state  of  every  message  of  every  live  frame.  These  de¬ 
scriptions  must  indicate  whether  each  message  has  been  successfully 
transmitted,  is  awaiting  transmission,  or  whether  it  is  in  transit.  Fur¬ 
thermore,  if  it  is  in  transit  we  must  keep  track  of  whether  it  was  suc¬ 
cessful  or  erased,  and  how  much  time  is  left  before  that  information 
is  sent  back  to  the  source.  Such  a  state  is  needed  for  every  possible 
combination  of  every  possible  component  of  each  live  layer.  The  result 
is  an  extremely  large  exponential  increase  in  the  total  size  of  the  state 
space  necessary  to  fully  describe  the  transmission  process. 
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network  delays  would. 

The  question  still  remaining  is  how  to  estimate  L 
and  e  in  order  to  compute  the  transmission  policy.  The 
next  two  sections  contain  our  approaches  to  these  esti¬ 
mation  problems. 

3.2  Estimating  the  Lifetime 

The  lifetime  L  is  defined  as  the  length  of  time  after  the 
arrival  of  a  frame  at  the  input  buffer  during  which  a  trans¬ 
mission  of  that  frame  will  arrive  at  the  receiver  in  time  for 
playback.  Our  purpose  in  estimating  L  is  two-fold: 

1 .  The  erasure  rate  threshold  value  that  determines  the 
optimal  transmission  policy  depends  on  L;  knowing 
L  lets  us  accurately  choose  the  best  policy. 

2.  An  accurate  estimate  L  not  only  prevents  the  source 
from  transmitting  layers  of  frames  that  would  arrive 
too  late  to  be  useful  (thus  wasting  bandwidth  and 
possibly  preventing  transmission  of  other  layers), 
it  also  maximizes  retransmission  opportunities  by 
preventing  the  source  from  “giving  up”  too  soon  on 
data  the  source  incorrectly  believes  will  not  reach 
the  receiver  in  time. 

Suppose  that  packet  n  contains  a  layer  of  frame  i 
which  arrives  at  the  source  buffer  at  time  ts[i,},  is  trans¬ 
mitted  at  time  t  r  [u],  arrives  at  the  receiver  at  time  tr[n\, 
and  is  scheduled  for  playback  at  time  tp[i\.  The  source 
lifetime  L[n]  of  the  data  can  be  written  as  the  following: 

(  [u]  —  tp  [i]  hproc  [i]  (tr  [u]  t.r  [u] )  /  s  [/  (  1  7) 

=  (tp[i\  -  4[*])  -  -  (4M  -  4  WID8) 

=  ^play[*]  -  <Voc['i]  -  Snet[n],  (19) 

where  <5pr0c  [i]  is  the  receiver’s  processing  delay  for  the  / 1 h 
frame,  <)ncl  [n]  is  the  one-way  network  delay  from  source 
to  receiver  experienced  by  packet  n,  and  <5piay  [i]  is  the 
delay  in  playback  of  the  ith  frame,  defined  as  the  absolute 
time  difference  between  the  playback  of  frame  i  and  its 
arrival  at  the  source’s  transmission  buffer.  When  there 
are  no  network  or  processing  delays,  <5piay  is  simply  the 
amount  of  playback  buffering  done  by  the  receiver. 

One  way  to  estimate  L[n\  is  to  estimate  each  of  the 
delay  components  of  Equation  19  individually.  However, 
computing  <5piay[i]  and  Anet  [n]  at  the  source  requires  either 
synchronization  of  the  source’s  and  receiver’s  clocks  or 
knowledge  of  the  offset  between  them.  Fortunately,  we 
can  avoid  both  of  these  requirements.  When  the  source 
transmits  a  layer  of  frame  i  in  packet  n,  it  time  stamps  the 
packet  with  its  estimate  U\  [u]  of  that  frame’s  time-to-live 
at  the  source: 

h[n]  =  (4$  +  L[n]j  -  tx[n ],  (20) 


where  L[n\  is  the  current  estimate  of  the  source’s  life¬ 
time.  The  first  term  in  the  above  equation  is  the  latest 
possible  time  the  source  believes  it  can  transmit  a  layer 
of  frame  i  so  that  it  arrives  in  time  for  playback.  The 
actual  time  at  which  the  packet  is  transmitted  is  sub¬ 
tracted  off  from  this  sum  to  result  in  an  estimate  of  the 
remaining  time-to-live  for  that  frame.  Besides  stamping 
packet  n  with  its  estimate  of  the  layer’s  time-to-live,  the 
source  also  stores  the  current  lifetime  estimate  L[n]  used 
to  compute  this  estimate.  At  the  receiver,  the  latest  time 
the  packet  could  arrive  for  playback  is  tp[i ]  —  <5pr0c['i]; 
hence  the  true  remaining  time-to-live  tt\  ['»]  of  packet  n 
when  it  arrives  at  the  receiver  tr  [n]  is: 

ti\  [n]  —  tp  [i]  bProc  [/]  tr  [n] .  (21) 

Upon  receiving  packet  n,  the  receiver  examines  the 
header  and  determines  which  layer  of  which  frame  the 
packet  contains,  the  frame’s  scheduled  playback  time, 
and  the  current  processing  delay.  If  the  receiver  does 
not  want  to  continuously  estimate  the  processing  delay 
(which  may  vary  with  the  system  load),  it  could  set  this 
delay  to  a  constant  or  0  to  ignore  it.  After  determining 
the  above  information,  the  receiver  calculates  the  frame’s 
actual  time  to  live  using  Equation  21,  and  computes  a 
residual  error: 

e[n\  =  tt\[n\  -  t&[n\.  (22) 

The  receiver  sends  the  residual  back  to  the  sender  in  its 
ACK  for  packet  n,  and  upon  receiving  this  ACK,  the 
sender  calculates  the  actual  lifetime: 

L[n]  =  L[n ]  +  e[n]  (23) 

—  T[/r]  -{-  (tp  [(]  hproc  [(]  tr  [/a]  ) 

+  L[n\  -  tx[n])  (24) 

—  4>[(]  bproc  [(]  ( t  r  [u]  t.r  [u]  )  t  s  [/(25) 

Equation  25  agrees  exactly  with  the  definition  of  L[n] 
given  by  Equation  17.  Because  the  source  and  receiver 
exchange  only  relative  times  (differential  as  opposed  to 
absolute  times),  clock  synchronization  is  unnecessary. 
Receiver  computations  may  be  reduced  by  including  the 
absolute  times  (tp[i]  —  <5pr0c['i])  and  tr[n\  in  the  ACK  in¬ 
stead  of  e[n\,  and  letting  the  source  calculate  t{\  [u]  and 
e[n]  itself. 

Because  the  source  does  not  receive  the  correction 
e[n]  for  the  estimate  L[n]  until  at  least  one  round  trip 
time  after  packet  n  was  initially  sent,  it  may  have  sent 
other  packets  in  the  interim.  Suppose  packet  n  s  ACK 
is  received  by  the  source  before  packet  m  is  sent,  but 
after  packet  to  —  1  is  sent  (to  >  n).  We  could  simply 
update  the  lifetime  estimate  to  immediately  account  for 
the  measurement  error,  i.e. 

L\m]  =  L[n ]  =  L[n ]  +  e[n}.  (26) 
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However,  if  any  of  the  delay  components  that  define  the 
lifetime  ( cf  Equation  19)  vary  rapidly  (i.e.,  faster  than 
one  round  trip  time),  then  the  actual  lifetime  L[n]  of 
packet  n  may  not  be  an  accurate  estimate  for  the  lifetime 
L[m]  of  packet  to.  This  could  cause  undesired  behav¬ 
ior,  like  incorrect  switching  of  the  transmission  policy  in 
response  to  short  term  fluctuations  in  the  network  delay. 
To  avoid  this  and  to  capture  underlying  long-term  char¬ 
acteristics  of  the  delays,  we  smooth  the  lifetime  estimate 
as  follows: 


1— 1 

1 

1 

7 

(27) 

L[m  —  1]  +  //A  [to] 

(28) 

(1  -  (3)L[m  -  1]  +  3L[n\. 

(29) 

In  this  simple  HR  filtering  process,  /3  determines  the 
amount  of  smoothing  performed:  lower  values  result  in 
more  smoothing  and  a  slower  reaction  to  changes  in  L, 
and  vice  versa.  The  source  uses  the  value  of  L  defined 
by  Equation  29  in  order  to  estimate  a  layer’s  time-to-live 
tt i  via  Equation  20.  The  source  must  also  use  its  lifetime 
estimate  to  determine  the  transmission  policy  and  decide 
the  expiration  times  of  frames  in  the  input  buffer.  How¬ 
ever,  because  the  estimate  L  may  lag  behind  the  actual 
lifetime  L  due  to  feedback  delay  and  the  HR  smoothing 
process,  we  do  not  base  these  decisions  directly  on  L, 
but  instead  on  a  value  L&t c  on  L  that  includes  a  safety 
margin.  Much  as  TCP  sets  its  retransmission  timers  not 
to  the  actual  round  trip  time  estimate  but  to  a  larger  value 
dependent  on  the  variance  in  the  estimate,  we  compute 
Tde c  in  an  analagous  fashion: 

Ldec[m]  =  fj,L[m]  -  <f>a^[m],  (30) 

where  fi  and  <p  are  design  parameters  and  a  '  is  a  mea- 

1j 

sure  of  the  deviation  in  the  sequence  of  L[n\  values.  We 
calculate  a  f  as  in  TCP: 

1j 

o j\m\  =  a^[m-l]+f3  ^|A[m]|  -  a  j\m  -  l]j  .  (31) 

After  the  source  computes  /idee  [m] ,  it  updates  the  trans¬ 
mission  policy  and  re-determines  the  expiration  times  of 
any  layers  still  in  its  input  buffer.  To  update  the  trans¬ 
mission  policy,  the  source  calculates  a  new  erasure  rate 
threshold  as  th  (L&clm]).  The  source  computes  the  new 
expiration  time  of  frame  i  as  /.,[/]  +  Ldec[m].  Because 
we  subtract  off  a  fraction  of  the  deviation  from  £  [to]  in 
Equation  30,  Ldec  is  more  likely  to  be  smaller  than  L  than 
than  it  is  larger,  especially  when  the  L  varies  rapidly. 
This  in  turn  increases  the  chance  that  layers  of  an  old 
frame  that  is  close  to  its  expiration  time  are  treated  as 
already  expired,  even  though  they  could  still  reach  the 
receiver  in  time.  However,  we  chose  to  subtract  the  de¬ 
viation  in  Equation  30  because  it  reduces  the  chance  that 


we  overestimate  the  lifetime  and  send  useless,  expired 
data. 

The  source  updates  L[m\,  Tdec,  and  the  transmis¬ 
sion  policy  whenever  it  receives  an  ACK  for  a  packet  n. 
If  n’s  ACK  is  lost  then  no  update  is  performed,  because 
the  source  cannot  accurately  correct  its  estimates  t{\  [u] 
and  L[n], 

3.3  Estimating  the  Erasure  Rate 

Since  the  best  transmission  policy  depends  on  the  packet 
erasure  rate,  we  must  accurately  estimate  this  parame¬ 
ter.  In  order  to  do  this,  we  employ  a  measurement  pro¬ 
tocol  similar  to  the  window-based  measurement  tech¬ 
nique  specified  by  the  Real-time  Transport  Protocol 
(RTP)  [26].  Each  packet  is  time-stamped  with  a  sequen¬ 
tially  increasing  packet  number.  Within  a  fixed  window 
of  packets,  the  receiver  records  the  number  of  packets 
received.  At  the  end  of  each  interval  defined  by  the  win¬ 
dow  size,  the  receiver  computes  how  many  packets  were 
expected  (the  difference  between  the  sequence  numbers 
of  the  last  successful  packet  and  the  first  packet  of  in  the 
window),  and  how  many  it  actually  received.  The  re¬ 
ceiver  then  calculates  the  loss  rate  over  the  interval,  ewm, 
and  updates  the  overall  loss  rate  estimate  e  via  an  HR 
low-pass  filter: 

^new  —  (1  ^)cold  T  Oi£ int-  (32) 

The  level  of  smoothing  caused  by  the  filter  depends  on 
the  parameter  a:  higher  values  result  in  less  smoothing 
and  a  faster  response  to  changes  in  e,  but  they  also  pro¬ 
duce  a  noisier  estimation  process  than  higher  levels  of 
smoothing  do. 

One  problem  with  this  window-based  scheme  is  if 
there  is  a  loss  burst  that  starts  near  the  end  of  an  estima¬ 
tion  interval,  e  will  not  be  updated  until  the  first  success 
after  the  burst  reaches  the  receiver.  A  solution  is  to  add 
a  timer  at  the  receiver.  The  receiver  sets  the  timer  for  the 
expected  end  of  a  measurement  interval  and  calculates 
the  number  of  packets  that  could  have  arrived.  When  the 
timer  expires  it  updates  e  accordingly.  Besides  adding 
computational  complexity  to  the  receiver,  this  scheme 
can  also  miscount  losses  if  the  network  delay  changes 
in  the  middle  of  a  measurement  interval.  Another  al¬ 
ternative  is  to  use  overlapping  windows,  which  results 
in  more  frequent  measurement  updates  at  the  expense  of 
increased  receiver  complexity. 

The  receiver  informs  the  source  of  the  erasure  rate 
by  putting  the  current  value  of  e  in  every  ACK.  This 
adds  robustness  to  ACK  loss.  Alternatively,  the  source 
can  compute  the  loss  rate  in  an  analogous  based  on 
ACKs.  To  prevent  overestimated  loss  rates  when  ACKs 
are  lost,  the  receiver  could  send  a  bit  vector  indicating  the 
loss/success  status  of  the  most  recently  received  packets. 


12 


4  Performance  Evaluation 


To  evaluate  the  performance  of  our  adaptive  protocol,  we 
simulated  transmission  of  a  multimedia  signal  over  the 
system  of  Figure  1 .  Before  testing  our  full-fledged  pro¬ 
tocol,  we  first  focused  on  two  of  its  components:  the  life¬ 
time  and  erasure  rate  estimation  techniques.  We  evalu¬ 
ated  these  techniques  and  tuned  their  parameters  by  mea¬ 
suring  their  response  to  step  changes.  We  then  simulated 
the  complete  protocol  and  compared  its  performance  to 
other  protocols.  To  evaluate  the  efficacy  of  our  lifetime 
estimation  technique,  we  compare  our  adaptive  protocol 
to  various  fixed-lifetime  transmission  protocols.  To  eval¬ 
uate  the  benefits  of  a  layered  encoding,  we  compared  our 
protocol  to  non-layered  transmission  schemes.  And  fi¬ 
nally,  to  evaluate  the  importance  of  adapting  the  trans¬ 
mission  policy,  we  compared  our  protocol  with  others 
that  adapt  to  lifetime  changes  but  which  keep  the  trans¬ 
mission  policy  static. 

We  simulated  changes  in  the  lifetime  L  and  erasure 
rate  e  by  randomly  changing  their  values  at  exponentially 
distributed  intervals  of  rate  A l  and  Ag,  respectively.  At 
the  end  of  an  interval  we  chose  a  new  value  of  L  from 
a  uniform  distribution  [/,  —  07,,  L  +  rr/,  j.  We  varied  the 
erasure  rate  in  a  similar  way,  except  that  we  set  the  era¬ 
sure  rate  to  max(0,  min(e,  1)),  where  e  is  uniformly  dis¬ 
tributed  in  the  interval  [e  —  ag ,  e  +  <7g] ,  in  order  to  restrict 
it  to  the  [0, 1]  interval.  Thus  e  is  the  median  value  of  the 
erasure  rate. 

Figure  7  illustrates  the  adaptivity  of  the  control  al¬ 
gorithm  that  governs  the  evolution  of  Laec.  The  x-axis 
denotes  time  in  terms  of  transmission  slots,  while  they 
y-axis  indicates  the  value  of  both  the  the  actual  life¬ 
time  L  and  protocol  lifetime  L^ec.  When  L  changes, 
Ldec  converges  to  the  new  value  in  about  10  transmis¬ 
sion  slots.  For  multimedia  signal  with  20  ms  frames  and 
T  =  4  transmission  opportunities/frame,  this  would  cor¬ 
responds  to  a  50  ms  convergence  time.  Figure  7  also 
shows  marks  that  indicate  when  packets  are  transmitted, 
and  if  they  are  successful  or  erased.  Ljec  does  not  change 
if  a  packet  is  erased. 

Figure  8  shows  the  adaptation  of  the  estimated  era¬ 
sure  rate  e  to  a  step  change  in  the  true  value  of  e.  A  com¬ 
parison  of  Figure  8  to  Figure  7  reveals  that  e  is  a  much 
noisier  estimate  than  L.  Unlike  the  lifetime,  the  erasure 
rate  cannot  be  measured  directly,  and  instead  must  be  es¬ 
timated  from  packet  loss.  Figure  8  shows  that  for  that 
while  e  does  correctly  respond  to  the  change  in  £,  its  ac¬ 
curacy  is  only  about  ±0.05.  The  accuracy  of  the  estimate 
and  its  quickness  to  respond  to  change  can  be  traded  off 
via  choice  of  a  and  the  size  of  the  loss  window. 

In  choosing  values  for  the  adaptation  parameters 
(a,  (3,  fi)  to  use  in  our  adaptive  protocol,  we  tried  to 
achieve  a  balance  between  speed  of  response  and  accu- 


Figure  7:  Lifetime  adaptation  to  step  response  for  adap¬ 
tation  parameters  (3  =  0.75,  <f>  =  1,  and  //  =  1 .  The 
frame  period  is  T  =  4  and  the  erasure  rate  e  =  0.45. 

racy.  Table  3  lists  the  parameter  values  we  chose  for  our 
simulations.  We  ran  these  simulations  for  a  range  of  me¬ 
dian  erasure  rates  ranging  from  0.05  to  0.95.  We  com¬ 
pared  our  layered  adaptive  protocol  to  three  other  types 
of  transmission  schemes: 

•  Layered  transmissions  with  a  fixed  transmission 
policy  which  always  favored  older  messages  over 
newer  messages,  and  which  used  fixed,  non- 
adaptive  values  for  Laec.  (Results  are  shown  in  Fig¬ 
ure  9). 

•  Non-layered  transmissions  with  lifetime  adaptation. 
These  policies  always  sent  the  oldest  live  frame.  We 
simulated  two  different  policies:  one  with  the  same 
frame  size  T  and  one  with  the  frame  size  half  as 
small.  With  the  former,  messages  are  twice  as  large 
as  the  two-layered  case,  and  with  the  latter  they  are 
equal  sized.  (Figure  10) 

•  A  layered  transmission  with  a  fixed  phase-invariant 
transmission  policy  (always  favoring  the  older  Ol 
over  the  newer  A’//,  or  vice  versa)  but  with  full  life¬ 
time  adaptation.  (Figure  11) 

We  used  a  positive  acknowledge  (ACK)  feedback 
scheme  for  all  of  the  protocols.  Losses  were  detected 
by  the  source  via  gaps  in  the  sequence  of  ACKs.  In  addi¬ 
tion,  protocols  adaptively  estimating  the  lifetime  and/or 
the  erasure  rate  included  time-to-live  corrections  and/or 
the  current  erasure  rate  estimate  in  the  ACKs. 

Figure  9  illustrates  the  benefits  of  adaptively  esti¬ 
mating  the  data  lifetime.  The  adaptive  policy  performs 
better  at  all  erasure  rates  than  the  three  fixed  policies  that 
do  not  adapt  to  changes  in  L.  At  low  erasure  rates,  the 
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Table  3:  Simulation  parameters. 


Figure  8:  Erasure  rate  adaptation  to  step  response  for 
a  =  0.125  and  a  loss  window  of  50  packets.  The  frame 
period  is  T  =  4  and  the  frame  lifetime  is  set  to  L  =  6. 

fixed  policy  with  L  =  4  is  worst  because  it  underes¬ 
timates  the  lifetime  and  hence  prevents  retransmissions 
of  messages  because  it  thinks  they  have  expired,  even 
if  they  actually  have  not.  At  high  erasure  rates  more  re¬ 
transmissions  are  necessary,  and  so  the  worst  fixed  policy 
is  L  =  8  because  it  overestimates  the  data  lifetime  and 
transmits  expired  messages. 

The  advantages  of  layering  the  data  are  illustrated 
by  Figure  10.  The  adaptive  scheme  is  demonstrably  bet¬ 
ter  than  the  non-layered  schemes.  For  the  non-layered 
schemes.  Figure  10  also  shows  there  is  an  advantage  to 
halving  the  frame  size  in  order  to  send  smaller  pack¬ 
ets  more  frequently.  When  the  frame  length  is  halved 
packets  are  sent  twice  as  often,  but  their  lifetime  is  un¬ 
changed.  As  a  result  the  short-term  loss  rate  observed 
during  a  frame’s  lifetime  has  a  smaller  variance,  and  this 
reduces  the  chance  that  more  packets  than  average  are 
lost  and  not  retransmitted  in  time,  and,  conversely,  that 
fewer  than  average  packets  are  lost,  which  can  result  in 
the  transmission  channel  being  idle.  As  the  erasure  rates 
increases,  the  smaller-frame  benefit  decreases  because 
the  variation  in  the  short-term  loss  rate  becomes  less  sig¬ 
nificant. 

Finally,  Figure  11  compares  the  adaptive  protocol 
with  two  protocols  which  have  fixed  transmission  poli¬ 
cies  (either  always  favoring  Ol  or  A//)  but  which  do 
adaptively  estimate  L  so  as  to  know  when  the  messages 
expire.  The  adaptive  protocol’s  performance  when  it  has 
perfect  knowledge  of  e  (i.e.,  it  is  not  estimated)  is  also 
shown  for  comparison.  Although  the  two  adaptive  proto¬ 
cols  do  better  than  the  Nh  -favoring  scheme  at  low  era¬ 
sure  rates  and  the  Ol  -favoring  scheme  at  high  erasure 
rates,  the  performance  improvement  is  small.  Further¬ 


Parameter 

Value 

Simulation  Length 

2  x  105  frames 

Number  of  Layers  ( N ) 

2/frame 

T 

4  s 

L 

6  s 

OL 

2  s 

K1 

100  s 

P 

0.75 

» 

1 

0.5 

0£ 

0.25 

Ae1 

1000  s 

a 

0.5 

di 

0.1 

more,  there  are  points  at  which  the  adaptive  protocol 
(even  with  perfect  e  knowledge)  does  worse  than  one  or 
both  of  the  fixed  schemes.  Two  reasons  for  this  are: 

•  The  policy  that  is  optimal  below  the  threshold  value 
of  e  is  the  worst  policy  above  that  value,  and  vice 
versa.  As  a  result  the  adaptive  protocol  relying  on 
estimates  of  e  may  incorrectly  choose  the  worst  pos¬ 
sible  policy  if  the  current  estimate  of  e  is  on  the 
wrong  side  of  the  threshold. 

•  The  e-threshold  values  which  determine  the  adap¬ 
tive  protocols  policy  decisions  are  derived  from  a 
steady-state  analysis  of  fixed  L  and  e — in  other 
words,  what  is  the  best  policy  when  L  and  e  are 
fixed  over  a  long  period  of  time.  Since  these  pa¬ 
rameters  are  changing  throughout  our  simulation, 
the  system  never  settles  into  steady-state,  and  so  the 
optimal  policy  over  each  short  interval  of  time  dur¬ 
ing  which  e  and  L  are  constant  is  not  necessarily 
the  same  as  the  optimal  steady-state  policy  for  those 
values. 

Our  simulation  results  of  Figure  1 1  illustrate  the  diffi¬ 
culty  in  adaptively  determining  the  best  transmission  pol¬ 
icy,  as  well  as  the  marginal  benefits  of  our  attempt  to  do 
this.  This  only  implies  that  one  component  of  our  stream¬ 
ing  multimedia  protocol:  trying  to  optimally  adapt  the 
transmission  policy.  There  are  still  substantial  benefits  to 
be  had  from  the  other  pieces,  however.  Specifically,  Fig¬ 
ure  10  illustrates  that  layering  the  media  signal  improves 
signal  quality  by  increasing  the  chance  that  the  most  im¬ 
portant  parts  are  successfully  sent,  and  Figure  9  illus¬ 
trates  how  adapting  the  message  lifetime  improves  per¬ 
formance  by  maximizing  the  number  of  retransmission 
opportunities  while  still  preventing  useless  transmissions 
of  expired  messages. 
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Figure  9:  Comparison  of  fully  adaptive  scheme  to  three 
fixed  policies  (always  favoring  older  layers)  which  do  not 
estimate  L. 

5  Related  Work 

There  has  been  a  significant  amount  of  work  done  on 
streaming  multimedia,  and  much  of  this  is  oriented  to¬ 
wards  improving  the  performance  of  interactive  multi- 
media  streams.  Because  delay  requirements  of  interac¬ 
tive  multimedia  (less  than  200  ms  by  some  measures  [6]) 
typically  preclude  soft  ARQ  for  error  recovery,  interac¬ 
tive  multimedia  research  has  explored  alternative  ways 
to  improving  signal  quality.  Before  discussing  related 
work  on  soft  ARQ,  we  briefly  review  these  alternative 
techniques. 

One  such  technique  for  interactive  media  is  to  ad¬ 
just  the  playback  point  at  the  receiver  to  compensate  for 
variations  in  network  delay  (jitter).  Many  algorithms 
have  been  developed  to  automatically  adjust  the  playback 
point  [15,  8,  24,  14,  19].  The  common  goal  of  these  algo¬ 
rithms  is  to  minimize  the  playback  delay  small  without 
causing  signal  dropouts  stemming  from  frames  that  ar¬ 
rive  past  their  scheduled  playback  times.  Because  they 
were  developed  for  interactive  multimedia,  these  algo¬ 
rithms  do  not  try  to  increase  the  playback  buffer  enough 
to  allow  frames  lost  by  the  network  (due  to  conges¬ 
tion)  to  be  retransmitted.  Instead,  two  techniques,  error 
concealment  (EC)  and  forward  error  correction  (FEC) 
have  been  frequently  advocated  to  control  errors  result¬ 
ing  from  packet  loss  in  streaming  multimedia. 

EC  does  not  add  any  delay  because  it  relies  upon 
the  receiver  to  patch  up  missing  packet(s)  by  concealing 
the  loss  to  the  listener/viewer  [28,  29,  13].  Perceptual 
models  can  be  exploited  to  carry  out  effective  conceal¬ 
ment,  but  oftentimes  simpler  techniques  are  employed 
where  the  receiver.  For  example,  in  audio  the  receiver 


Figure  10:  Comparison  of  the  layered  scheme  with  adap¬ 
tive  policy  decisions  to  two  non-layered  schemes  (which 
need  no  transmission  policy). 


can  replace  missing  frames  with  silence,  white-noise,  or 
a  repeat  of  the  last  successfully  received  frame.  And  al¬ 
though  EC  requires  no  network  or  delay  overhead,  its 
performance  is  comparatively  limited  by  the  lack  of  side 
information  about  the  missing  data.  Also,  it  is  much 
harder  to  conceal  consecutive  losses  (“bursts”)  than  iso¬ 
lated  ones.  Hence  EC  is  most  useful  in  conjunction 
with  an  error-correcting  technique  (such  as  ARQ  or  FEC) 
which  reduces  the  loss  rate  to  a  level  low  enough  that  EC 
can  be  effectively  used  to  mask  unavoidable  losses  that 
do  occur. 

Forward  error  correction,  on  the  other  hand,  lowers 
the  effective  loss  rate  of  streaming  multimedia  by  adding 
redundant  information  to  the  original  source  data  while 
incurring  a  small  delay.  This  redundancy  is  commonly 
computed  using  algebraic  block  codes  [27,  3,  7],  but  re¬ 
cent  work  in  “signal  processing-based  FEC”  (SFEC)  has 
examined  adding  redundant  highly  compressed  versions 
of  the  media  signal  [13,  5,  4].  In  both  cases,  delay  can  be 
traded  for  robustness  to  error  bursts.  However,  both  tech¬ 
niques  also  result  in  increased  load  on  the  network  due 
to  the  overhead  of  the  redundancy.  Since  packet  losses 
are  usually  the  result  of  network  congestion,  the  addition 
of  FEC  can  actually  cause  losses  that  otherwise  would 
not  occur.  [22,  23]  showed  that  when  many  multime¬ 
dia  users  simultaneously  increase  their  rate  by  applying 
SFEC,  their  performance  actually  worsens. 

One  disadvantage  of  FEC  is  that  the  error  correction 
is  forward ;  because  the  source  does  not  know  a  priori 
which  packets  will  be  lost,  it  sends  redundant  informa¬ 
tion  even  if  it  is  not  actually  needed.  ARQ  retransmis¬ 
sion  schemes,  on  the  other  hand,  only  send  extra  infor- 
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Figure  1 1 :  Comparison  of  adaptively  changing  the  trans¬ 
mission  policy  to  two  fixed  transmission  policies.  The 
data  lifetime  is  adaptively  estimated  in  all  cases.  Addi¬ 
tionally,  the  performance  of  an  adaptive  policy  in  which 
the  source  has  perfect  information  of  e  is  shown. 

mation  that  the  sender  believes  has  been  lost;4  as  a  result 
they  do  not  unnecessarily  waste  bandwidth  when  there  is 
no  packet  loss,  and  they  can  easily  adapt  to  changes  in 
the  loss  rates.  Thus  studies  have  examined  soft  ARQ  for 
both  unicast  [20,  1 1]  and  multicast  [21,  31,  30]  streaming 
multimedia.  One  way  our  work  differs  from  all  of  these 
is  that  we  assume  there  is  an  overall  transmission  rate 
limit,  so  that  a  retransmission  of  one  message  can  come 
at  the  expense  of  the  first  transmission  of  another;  these 
other  works  assume  that  enough  bandwidth  is  available 
for  any  retransmissions  the  sender  decides  to  send. 

Methods  for  making  traditional  ARQ  schemes  such 
as  TCP  more  suitable  to  delay  constrained  multimedia 
are  given  in  [20].  They  propose  the  following  enhance¬ 
ments  to  a  selective  repeat-based  retransmission  scheme: 
using  gap-based  loss  detection  at  the  receiver,  as  opposed 
to  timer-based  techniques;  playout  buffering  at  the  re¬ 
ceiver  to  allow  retransmission  attempts;  implicit  expi¬ 
ration  of  data  at  the  sender  through  knowledge  of  the 
receiver’s  playback  buffer  size  (to  avoid  both  waiting 
for  ACKs  and  retransmitting  packets  that  would  arrive 
late);  and  conditional  retransmission,  whereby  the  re¬ 
ceiver  maintains  an  estimate  of  the  RTT  to  the  receiver 
and  only  requests  retransmission  if  it  is  expected  to  ar¬ 
rive  in  time  for  playback.  Our  protocol  incorporates 
all  of  these  ideas;  the  only  significant  difference  is  that 
the  sender  controls  retransmission  decisions  based  on 
the  data  lifetime  estimate  rather  than  relying  on  the  re¬ 
ceiver  to  estimate  the  RTT  and  suppress  retransmission 

4Whether  this  belief  is  accurate  depends  on  the  specifics  of  the  pro¬ 
tocol  and  network. 


requests. 

Retransmission  schemes  for  interactive  unicast  non- 
layered  multimedia  are  also  studied  in  [11],  which  fo¬ 
cuses  on  the  viability  of  impact  of  the  playback  delay 
on  the  effectiveness  of  streaming  multimedia.  Using  an 
end-to-end  voice  transmission  model  and  an  empirical 
measurement  study,  the  authors  conclude  that  there  are 
playback  delays  meeting  the  delay  constraints  of  interac¬ 
tive  audio  which  still  allow  for  a  high  probability  of  suc¬ 
cessful  retransmissions  of  single -packet  losses.  Unlike 
our  work,  the  authors  do  not  consider  multiple  retrans¬ 
missions  of  frames  that  are  lost  multiple  times. 

A  general  examination  of  NACK-based 
retransmission-based  schemes  for  multicast  real-time 
media  is  given  in  [21].  The  authors  present  an  analysis 
which  indicates  that  not  only  are  retransmissions  both 
useful  and  practical  for  real-time  media,  but  in  many 
situations  it  is  optimal  for  the  source  to  immediately 
multicast  a  retransmission  upon  the  reception  of  a 
NACK  from  any  receiver.  Although  the  average  number 
of  packets  sent  is  a  factor  in  their  optimality  criteria, 
they  do  not  account  for  the  potential  impact  of  the 
retransmissions  on  congestion,  and  hence  the  loss  rate. 

Two  specific  retransmission-based  schemes  that 
have  been  proposed  for  multicast  streaming  mul¬ 
timedia  are  STructure-Oriented  Resilient  Multicast 
(STORM)  [30]  and  Layered  Video  Multicast  with  Re¬ 
transmission  (LVMR)  [31].  STORM  is  a  NACK-based 
technique  that  expands  upon  the  Scalable  Reliable  Mul¬ 
ticast  (SRM)  [12]  approach  by  adding  local  recovery  and 
a  multi-leaf  tree  structure  to  a  multicast  non-layered  mul¬ 
timedia  stream.  Receivers  send  retransmissions  requests 
via  NACKs  to  a  parent  node  that  is  selected  based  on 
typical  packet  reception  times  and  the  receiver’s  own 
playback  buffer,  which  is  assumed  to  be  fixed  and  pre¬ 
selected.  However,  because  receivers  request  retrans¬ 
missions  of  lost  frames  as  long  as  their  scheduled  play¬ 
back  has  not  yet  arrived;  the  source  may  retransmit  lost 
frames  that  will  arrive  too  late  for  playback  at  the  re¬ 
ceiver.  LVMR,  on  the  other  hand,  adds  “smart  retrans¬ 
missions”  to  the  Reliable  Multicast  Transport  Protocol 
(RMTP)  [16],  so  that  a  receiver  sends  a  repair  request  to 
a  designated  receiver  only  if  the  receiver  estimates  the 
retransmission  will  arrive  in  time  for  playback.  LVMR 
also  uses  a  layering  transmission  scheme  that  builds  upon 
Receiver-driven  Layered  Multicast  (RLM)  [18].  Unlike 
our  work,  in  which  layering  is  used  to  control  the  order 
in  which  data  is  sent,  LVMR  uses  layering  to  control  the 
overall  transmission  rate  and  to  adjust  the  playback  de¬ 
lay  of  each  receiver.  The  idea  is  to  allow  more  time  for 
retransmission  when  the  receiver  gets  a  smaller  number 
of  frames  per  second. 

The  MESH  protocol  [17]  is  another  framework  for 
ARQ-based  error  recovery  of  multicast  streaming  multi- 
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media.  MESH  assumes  the  multicast  group  is  partitioned 
into  subgroups  (which  represent  high  performance  local 
area  networks),  and  each  subgroup  elects  an  active  re¬ 
ceiver  (AR)  to  coordinate  error  recovery  between  sub¬ 
groups.  [17]  focuses  on  an  ARQ-based  mechanism  for 
error  recovery  between  the  ARs.  An  AR  sends  a  repairs 
requests  to  the  AR  with  the  lowest  RTT  that  has  not  re¬ 
cently  experienced  a  similar  loss  pattern.  Receivers  also 
suppresses  requests  if  the  remaining  time  until  playback 
is  less  than  the  RTT  estimate.  Our  protocol  differs  from 
this  approach  (  and  LVMR)  in  that  we  use  a  sender-based 
suppression  mechanism  that  relies  on  a  time-to-live  mea¬ 
surement  based  upon  the  one-way  network  delay.  Be¬ 
cause  the  network  delays  and  traffic  loads  in  the  paths 
between  the  source  and  receiver  need  not  be  symmetric, 
the  variance  in  those  delays  may  not  be  the  same  either. 
A  potential  advantage  of  our  source -based  scheme  is  that 
its  data  lifetime  estimate  (in  terms  of  the  latest  allow¬ 
able  time  it  could  transmit  a  layer)  needs  only  to  account 
the  forward  transmission  delay,  and  thus  should  have  less 
variance  than  a  receiver-based  round  trip  time  estimate 
which  must  account  for  delay  in  both  directions. 

[10]  describes  FLITT,  a  fast  lossy  Internet  image 
transmission  scheme.  FLITT  is  a  FEC -based  scheme  for 
transmission  of  layered  images  in  a  finite  amount  of  time 
(this  time  is  determined  by  a  transmission  rate  that  is 
fixed  for  the  image).  FLITT  starts  with  a  fixed-rate  lay¬ 
ered  encoding  of  the  image  (a  wavelet  transform  is  used); 
the  rate  is  unequally  allocated  to  each  layer  (by  adjusting 
the  quantizer  step  sizes)  so  that  the  total  image  distortion 
is  minimized.  Then  as  the  image  is  transmitted,  FLITT 
dynamically  allocates  the  total  rate  between  the  image 
(again  adjusting  the  quantizer  step  sizes)  and  FEC  (ad¬ 
justing  the  amount  of  redundancy).  More  FEC  and  quan¬ 
tization  bits  are  given  to  visually  important  layers.  Re¬ 
sults  indicate  that  FLITT  transmissions  of  a  lossy  version 
of  the  image  were  up  to  five  times  faster  than  TCP  trans¬ 
missions.  Although  FLITT  is  not  an  ARQ-based  pro¬ 
tocol,  it  is  an  example  of  a  joint  source-channel  coding 
scheme  that  incorporates  layering  to  adapt  to  changing 
network  conditions. 

The  tradeoff  we  analyzed  in  choosing  between  mes¬ 
sages  differing  in  both  priority  and  playback  deadlines 
has  analogies  to  delay-constrained  class-based  queuing, 
in  which  a  switch  must  choose  between  packets  of  dif¬ 
ferent  priorities  (classes)  with  different  deadlines.  Bhat- 
tacharya  and  Ephremides  examined  such  queuing  prob¬ 
lems  in  [1]  and  [2],  An  important  distinction  is  that 
in  their  work,  the  arrival  times  of  packets  (i.e.,  produc¬ 
tion  times  of  layers)  are  random  and  geometrically  dis¬ 
tributed;  in  our  case,  we  have  known  deterministic  and 
periodic  arrival  times  of  messages. 


6  Future  Work 

In  our  analysis  of  the  2-layer  (N  =  2),  2-frame  overlap 
(K  =  2)  case  of  our  transmission  model  in  Section  2, 
we  found  that  the  optimal  transmission  policy  was  al¬ 
ways  one  of  the  two  phase-invariant  policies,  and  that 
other  was  the  worst  policy.  Although  we  have  observed 
identical  results  for  all  other  values  of  L  and  T  when 
K  =  2  and  N  =  2,  the  cases  of  multiple  layers  ( N  >  2) 
and  multiple  overlaps  (K  >  2),  as  well  as  delay  in  the 
feedback,  still  remain  open  to  examination.  With  all  of 
these  issues  the  problem’s  state  space  grows  exponen¬ 
tially.  As  a  result  other  approaches  to  analyzing  the  prob¬ 
lem  should  be  explored,  such  as  Markov  decision  analy¬ 
sis  or  approximations  to  simplify  the  analysis. 

We  also  found  that  our  fully  adaptive  protocol 
which  switched  the  transmission  policy  based  on  the 
current  lifetime  and  erasure  rate  estimates  had  few  per¬ 
formance  benefits  over  fixed  transmission  policies  that 
adaptively  found  the  data  lifetime.  Although  this  neg¬ 
ligible  benefit  stems  from  the  time-varying  case’s  non- 
stationarity  and  the  chance  of  using  the  worst  policy 
when  the  network  estimates,  it  remains  to  be  seen  if  these 
factors  have  the  same  impact  when  there  are  multiple  lay¬ 
ers  and  multiple  overlaps.  Using  more  layers  gives  the 
sender  finer  control  and  granularity  over  what  to  trans¬ 
mit  and  how  those  choices  affect  distortion.  And  com¬ 
bining  more  layers  with  longer  overlaps  leads  to  many 
more  possible  policies  to  choose  from.  Analysis  should 
be  performed  to  determine  if  the  set  of  optimal  policies 
contains  more  than  two  extremes,  and  if  so  does  this  re¬ 
sult  in  more  significant  performance  benefits  from  adapt¬ 
ing  the  transmission  policy? 

As  discussed  Section  3.1,  one  limitation  of  our  anal¬ 
ysis  is  the  zero-network-delay  assumption.  Because  this 
assumption  clearly  does  not  hold  in  the  Internet,  it  could 
be  eliminated  in  future  work.  A  difficulty  in  accounting 
for  network  delay  is  that  the  delay  leads  to  an  explosion 
in  the  state  space,  and  as  a  result  the  number  of  potential 
transmission  policies.  Another  limitation  of  our  work 
is  our  assumption  that  packet  erasures  are  independent 
events;  Internet  losses  are  often  very  correlated.  Future 
work  could  account  for  correlation  by  adding  incorporat¬ 
ing  the  network  status  into  the  state  space  (for  example, 
a  2-state  Gilbert  model  could  model  losses). 

As  mentioned  in  Section  3.2,  a  potential  advantage 
of  our  data  lifetime  estimation  is  that  it  only  need  adapt 
to  variations  of  the  forward  network  delay,  in  contrast 
to  RTT-based  techniques  which  must  adapt  to  variations 
in  both  the  forward  and  reverse  delays.  For  asymmet¬ 
ric  connections  this  might  be  especially  important.  An¬ 
other  area  of  future  work  is  to  study  how  much  perfor¬ 
mance  advantage  ( e.g .,  in  terms  of  correctly  suppressing 
requests  that  would  arrive  late  and  allowing  requests  that 
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will  arrive  in  time),  if  any,  our  one-way  estimation  pro¬ 
vides. 

With  all  ARQ-based  schemes,  the  receiver  or  source 
must  determine  when  a  packet  is  lost.  Gap-based  detec¬ 
tion  schemes  cause  unnecessary  retransmissions  when 
packets  arrive  out  of  order  and  excessive  delay  when  a 
burst  of  packets  is  lost;  timer  based  schemes  can  cause 
excessive  delay  or  unnecessary  retransmissions  when  the 
network  delay  changes  more  quickly  than  the  timer’s  es¬ 
timate.  We  are  currently  investigating  a  hybrid  scheme 
which  measures  the  level  and  frequency  of  both  reorder¬ 
ing  and  packet  loss  and  uses  this  information  to  adapt 
how  long  it  waits  before  determining  a  packet  has  been 
lost.  The  idea  is  that  if  reordering  is  observed  much  less 
often  than  packet  losses,  retransmissions  should  be  sent 
shortly  after  a  packet  gap  because  the  gap  corresponds 
to  a  loss  with  high  probability.  However,  if  the  relative 
level  of  reordering  rises,  retransmissions  after  gap  detec¬ 
tion  are  delayed  to  allow  time  for  reordered  packets  to 
arrive. 

Finally,  we  have  presented  a  scheme  which  can 
adapt  to  both  changes  in  network  delay  and  in  receiver 
buffering.  Prior  works  have  either  studied  the  perfor¬ 
mance  of  various  static  receiver  playback  buffer  sizes 
for  retransmission  or  looked  at  ways  of  minimizing  the 
playback  buffer  for  interactivity,  but  no  work  has  looked 
at  combining  the  two.  It  remains  an  open  area  of  re¬ 
search  to  dynamically  compute  the  playback  buffer  size 
as  network  conditions  change.  For  example,  an  algo¬ 
rithm  might  be  designed  to  compute  the  optimal  receiver 
playback  size  based  on  the  current  erasure  rate,  delay, 
and  two  curves  characterizing  a  media  application’s  per¬ 
formance  as  a  function  of  error  rate  and  of  delay.  Play¬ 
back  delay  could  be  traded  off  for  error  recovery  accord¬ 
ing  to  the  requirements  of  the  application. 

7  Conclusion 

We  have  developed  a  model  for  ARQ-based  recovery  of 
streaming  layered  multimedia.  A  key  result  from  our 
analysis  is  that  it  is  not  always  beneficial  to  favor  trans¬ 
mission  of  older,  low  priority  layers  over  newer,  higher 
priority  layers.  We  have  applied  the  results  of  our  analy¬ 
sis  to  develop  a  retransmission  protocol  which  adapts  to 
changes  in  the  network  erasure  rate  and  in  delay  compo¬ 
nents  affecting  the  on-time  delivery  of  the  streaming  data 
(link  delays,  receiver  buffering,  processing  delays).  We 
have  introduced  a  novel  scheme  to  prevent  data  transmis¬ 
sions  that  will  not  arrive  at  the  receiver  in  time  for  play¬ 
back.  By  relying  on  exchanges  of  “time-to-live”  infor¬ 
mation  between  the  source  and  receiver,  this  scheme  can 
adapt  to  one-way  network  delay  changes  without  need 
the  source  and  receiver’s  clocks  to  be  synchronized.  Fi¬ 


nally,  although  we  did  not  find  significant  benefits  from 
adapting  the  transmission  policy,  we  did  show  that  there 
are  benefits  from  layering  the  media  signal  and  from 
adapting  to  delay  changes. 
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